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The present invention pertains to the field of biology, more particularly the subject of 
the present invention is the identification of a nucleotide sequence which make it possible in 
particular to distinguish an infection resulting from Aiycobacterhm tuberculosis from an 
infection resulting from Mycobacterium qfi-icanum, Mycobacterium canettU A^bacterium 
microti. Mycobacterium bovis, Mycobacterium bovis BCG. The subject of the present 
invention is also a method for detecting the sequences in question by the products of 
expression of these sequences and the kits for canying out these methods. Finally, the 
subject of die present invention is novel vaccines. 

Despite more tban a century of research since the discoveiy of A^obacterium 
tuberculosis, the aetiological agent of tuberculosis, this disease remains one of the major 
causes of human mortality. M tuberculosis is expected to kill 3 million people annually 
(Snider, 1989 Rev. Inf. Dis. S335) and the number of new people getting infected each year 
is rising and is estimated at 8.8 million. Although the majority of these are in developing 
countries, the disease is assuming renewed importance in the western countries due to the 
increasing number of homeless people, the impact of the AIDS epidemic, the changing 
global migration, and the travel patterns. 

Early tuberculosis often goes unrecognized in an otherwise healthy mdividual. 
Classical initial methods of diagnosis include examination of a sputum smear under a 
microscope for acid-fast mycobacteria and an x-ray of the lungs. However, in a vast majority 
of cases the sputum smear examination is negative for Mycobacteria in the early stages of 
the disease, and lung changes may not be obvious on an x-ray until several months following 
infection. Another complicating factor is that acid-fast bacteria in a sputum smear may often 
be other species of mycobacteria. Antibiotics used for treating tuberculosis have 
considerable side effects, and must be taken as a combination of three or more drugs for a six 
to twelve month period. In addition, the possibility of inducing the appearance of drag 
resistant tuberculosis prevents therapy from being administered without solid evidence to 
support the diagnosis. Currently the only absolutely reliable method of diagnosis is based on 
culturing M tuberculosis from the clinical specimen and identifying it moiphologically and 
biochemically. This usually takes anywhere from three to sue weeks, during which time a 
patient may become seriously iU and infect other individuals. Therefore, a rapid test capable 
of reliably detecting the presence of A£ tuberculosis is vital for the early detection and 
treatment Several molecular tests have been developed recently for the rapid detection and 
identification of U tuberculosis, such as the Gen-Probe "Amplified A^obacteriwn 
tuberculosis Direct Test"; this test amplifies M tuberculosis 16S ribosomal RNA from 



2 



10 



15 



20 



respiratory specimens and uses a chemiluminescent probe to detect the amplified product 
with a reported sensitivity of about 91%. The discovery of the 186110 insertion element 
(Cave et al.. Eisenach et a/..1990 J. Infectious Diseases 161:977-981; Thieny et al. 1990 J. 
Clin Microbiol. 28: 2668-2673) and the belief that this element may only be present m 
Mycobacterium complex (M tuberculosis. Mbovis, Mbovis-BCG. U africanum and 
UmicrotO spa^vned a vvhole series of rapid diagnostic strategies (Brisson-Noel et al, 1 991 
Lancet 338: 364-366; Clamdge et aLl993, J. Clin. Microbiol. 31 :2049-2056 ; Cormican et 
3, poo . nv.r. P.thnlo^ 1992, 45 : 601-604 ; Cousins et al., 1992 J. Clin. Microb iol. 30 : 
255-258 • Del PortUlo et al. 1991 J. Clin. Microbiol. 29 : 2163-2168 ; Folgueua et al.. 1994 
Neuroio^ 44 :1336-1338 ; Forbes et al. 1993. J.ClinMicrobiol. 31 :1 688-1694 ; Hermans et 
al 1990 J Clin. Microbiol. 28 :1204-1213 ;, Kaltwasser et al. 1993 Mol. Cell. Probes 7 : 
465-i70- Kocagoz et al. 1993 J. Clin. Microbiol. 31:1435-1438; Kolk et al. 1992 
J ClinMicrobiol. 30 : 2567-2575 ; Kox et al. 1994 J.Clin.Microbiol. 32 :672-678 ; Liu et al. 
1994 Neurology 44 :n61-U64 ; Miller et al. 1994 J. ClinMicrobiol. 32 : 393-397 ; Reischl 
etal 1994Biotechniquesl7:844-845;Schl„geretal.l994Chestl05:1116.1121 ;Shawar 

et al 1993 J. Clin. Microbiol. 31: 61-65; Wilson et al 1993 J.Clin.Microbiol. 28: 2668- 
2673) These tests employ various techniques to extract DNA from the sputum. PGR is used 
to amplify IS61 10 DNA sequences from Ae extracted DNA. The successful amplification of 
this DNA is considered to be an indicator of Ac presence of M:tuberculosis infection. U.S. 
. Pat. Nos. 5,168.039 and 5.370.998 have been issued to Crawford et a/. for the 1S61 10 based 
detection of tuberculosis. European patent EP 0.461,045 has been issued to Guesdon for the 
IS61 10 based detection of tuberculosis- 

Ttas, ft«e molecular as»^ used to de.«=t U lubercvlo^is depend on tte 186110 
tas«Uou s«,u»ce (about 10 copies) or ,h. 16S ribo«,mal RNA (thousands of eopies). 
Hov»v.r. tbeso mrthods do ^ provide any infonn«ion regMing .he sub-type of the 
n»oobac.»ia. Indeed se««a dozen sp«=ies of Mycobacteria are known, and most are non- 
paaogenic fcr h«n»ms; «.be«».losia is u^ally earned by infecdon due to M. .uierculos.. 
,rfd, a S™ cases being caused by il£ M*. fica^U and M. c^ricmm. In order to choose 
an appn^ treatment ^ to conduct epidemiological investi6.<ions it - absolutely 
, necessary to b. able to rapidly «Ki accur«ely idenU^ isolates, i.e distinguish the sub-type 
of mycobacteria of the complex, originating ftom potential tuberculosrs 

patient5.ThafstheproMemtl»presentinventionintendstosolve. 

The presem Invention provides an isolated or purified nucleic ac,d v,herem sarf 
nucleic acid is selected from tiie group consistmg of: 



3 



10 



15 



a) SEQ ED N"!, named TbDl (M tuberculosis specific deletion 1); 

b) Nucleic acid having a sequence fully complementaiy to'SEQ ID N**! . 

c) Nucleic acid fragment comprising at least 8, 15, 20, 25, 30, 50, 100. 250, 
500, 750, 1000, 1500, 2000, 2500, 3000 consecutive nucleotides of SEQ ID 
N°l; 

d) Nucleic acid having at least 90% sequence identify after optimal alignment 
with a sequence defined in a) or b); 

e) Nucleic acid tiiat hybridizes under stringent conditions with the nucleic acid 
defined in a) or b); 

As used herein, tiie tenns « isolated » and « purified » acconling to the invention 
refer to a level of purify that is achievable using current technology. The molecules of the 
invention do not need to be absolutely pure (i.e., contain absolutefy no molecules of other 
ceUular macromoleoules), but should be sufficientiy pure so that one of oidinaiy skill in the 
art would recognize that they are no longer present in the envhx>nment in which tiiey were 
originally found (i.e., the cellular middle). Thus, a purified or isolated molecule according to 
the present invention is one that have been removed fi-om at least one other maciomolecule 
present in the natural environment in which it was found. More preferably, tiie molecules of 
the invention are essentially purified and/or isolated, which means that the composition in 
which they are present is ahnost completely, or even absolutely, free of other 
20 macrpmolecules found in the environment in which the molecules of the invention are 
originally found. Isolation and purification thus does not occur by addition or removal of 
salts, solvents, or elements of the periodic table, but must include the removal of at least 
some macromolecules. The nucleic acids encompassed by tiie invention are purified and/or 
isolated by any appropriate technique known to the ordinary artisan. Such techniques are 
widely known, commonly practiced, and well within the skill of tiie ordinary artisan. As used 
herein, the term « nucleic acid" refers to a polynucleotide sequence such as a single or 
double stranded DNA sequence, RNA sequence, cDNA sequence; such a polynucleotide 
sequence has been isolated, purified or synthesized and may be constituted with natural or 
non natural nucleotides. In a preferred embodiment tiie DNA molecule of tiie invention is a 
30 double stranded DNA molecule. As used herein, tiie terms "nucleic acid", "oligonucleotide", 
"polynucleotide" have the same meaning and are used indifferently. 

By tfie term "Mycobacterium complex" as used herein, it is meant tiie complex of 
mycobacteria causing tiiberculosis which are Afycobacterium tuberculosis, Afycobacterium 
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bovis. mohacteriran c^icanum. Mycobacterium microti. Mycobacterium cartettii and the 
vaccine strain Mycobacterium bovis BCG. 

The present invention encompasses not only ihe entire sequence SEQ ID N"!. .ts 
complement, and its doubl^stranded form, but any fragment of this sequence, its 
complement, and its double-stranded form. 

In embodiments, the fragment of SEQ ID N°l comprises at least approximately 8 
nucleotides. For example, the fragment can be between approximately 8 and 30 nucleotides 
...I^ed as a primer for polynucleotide synthesis. In another preferred 

. « . ^^ri^'MO'f-AUr 1 ^nn Jinn 



embodiment, the fragment of SEQ ID N«>1 comprises between approximately 1.500 and 
10 approximately 2,500 nucleotides, and more pr^ly 2153 nucleotides corresponding to 
SEQ ID N-4. A. used herein, "nucleotides" is used in reference to the number of nucleotides 
on a single-stranded nucleic acid. However, the term also encompasses double-stranded 
molecules. Thus, a fragment comprising 2.153 nucleotides according to the invention xs a 
single-stranded molecule comprising 2,1 53 nucleotides, and also adouble stranded molecule 
15 comprising 2153 base pairs (bp). 

In a preferred embodiment, the nucleic acid fragment of the invention xs specxfxcally 
deleted in the genome of ^obacterium tuberculosis, excepted in ^cobacterium 
tuberculosis strain having ti.e mutation CTG -> CGG at codon 463 of gene KatG and/or 
having no or very few IS61 10 sequences inserted in their genome and present in the genome 
20 of ^^bacterUm africanum. Mycobacterium canetti. Mycobacterium microU. 
Mycobacterium ba^is. Mycobacterium bovis BCG. By the term "few 1S6110 sequences 
inserted in the genome^ it is meant less than ten copies in the genome of M tuberculosu^, 
more preferably less than 5 copies, for example less than two copies. 

The nucleic acid fragment of the invention is preferably selected from the group 

25 consisting of: 

a) SEQ ID NM; 

b) Nucleic acid havmg a sequence fully complementary to SEQ ID N»4 . 

c) Nucleic acid fragment comprising at least 20, 25. 30. 50, 100r250. 500. 750. 1000. 
1500 2000. 2500. 3000 consecutive nucleotides of SEQ ID N°4; 

30 ' d) Nucleic acid having at least 90% sequence identity after optimal aligmnent with a 

sequence darned in a) or b); ..jfA 
e) Nucleic acid that hybridizes under stringent conditions with the nucleic ac.d defined 
in a) or b); In embodiments, the stringent conditions under which a sequence according to 
the invention is determined are conditions which are no less stringent than SxSSPE. 
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2xDenhardt's solution, and 0.5% (w/v) sodium dodecyl sulfate at 65**C. More stringent 
conditions can be utilized by the ordinaiy artisan, and the proper conditions for a given assay 
can be easily and rapidly determined without undue or excessive experimentation. As an 
illustrative embodiment, the strmgent hybridization conditions used in order to specifically 
5 detect a polynucleotide according to the present invention are advantageously the following: 
pre-hybridization and hybridization are performed at 65**C in a mixture containing: 

- 5X SSPE (IX SSPE is 3 M NaCl, 30 mM tri-sodium citrate) 

- 2X Denhardt's solution 

- 0-5% (w/v) sodium dodecyl sulfate (SDS) 
10 - 100 |xg ml"^ salmon sperm DNA. 

The washings are performed as follows: 

- two washings at laboratory temperature (approximately 21-25^C) for 10 min, in 
the presence of 2X SSPE and 0,1% SDS; and 

- one washing at 65^C for 15 min. m the presence of IX SSPE and 0.1% SDS. 

15 The invention also encompasses the isolated or purified nucleic acid of the invention 

wherein said nucleic acid comprises at least a deletion of a nucleic acid fragment as defined 
above. 

Polynucleotides of the invention can be characterized by the percentage of identity 
they show with the sequences disclosed herein. For example, polynucleotides having at least 

20 90% identity with the polynucleotides of the invention, particularly those sequences of the 
sequence listing, are encompassed by the invention. Preferably, the sequences show at least 
90% identity with those of the sequence listing. More preferably, they show at least 92% 
identity, for example 95% or 99% identity. TTie skilled artisan can identify sequences 
according to the invention through the use of the sequence analysis software BLAST (see for 

25 example. Coffin et al., eds., ^JRetrovirt4ses'\ Cold Spring Harbor Laboratory Press, pp* 723- 
755). Percent identity is calculated using the BLAST sequence analysis program suite. 
Version 2, available at the NCBI (NIH). All default parameters are used. BLAST (Basic 
Local Alignment Search Tool) is the heuristic search algorithm employed by the programs 
blastp, blastn, blastx, tblastn and tblastx, all of which are available through the BLAST 

30 analysis software suite at the NCBL These programs ascribe significance to their findings 
using the statistical methods of Karlin and Altschul (1990, 1993) with a few enhancements. 
Using this publicly available sequence analysis program suite, the skilled artisan can easily 
identify polynucleotides according to the present invention. 
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It is well within the skill of the ordinary artisan to identify regions of the nucleic acid 
sequence of the invention, which would be useful as a probe, prinner, or other experimental, 
diagnostic, or therapeutic aid. For example, the ordinary artisan could utilize any of the 
widely available sequence analysis programs to select regions (fragments) of these sequences 
5 that are useful for hybridization assays such as Southern blots, Northern blots, DNA binding 
assays, and/or in vitro, in sitti, or in vivo hybridizations. Additionally, the ordinary artisan, 
with the sequences of the present invention, can utilize widely available sequence analysis 

pr :-yt^ a ms -tfr-4f<fiin#^^e^^«^^^ "sed as probes and primers, as well as for design of 

anti-sense molecules. The only practical limitation on tiie fragment chosen by the ordinary 

10 artisan is the ability of the fragment to be useful for the purpose for which it is chosen. For 
example, if the ordinary artisan wished to choose a hybridization probe, he would know how 
to choose one of sufficient length, and of sufficient stability, to give meaningful results. The 
conditions chosen would be those typically used in hybridization assays developed for 
nucleic acid fragments of the approximate chosen length. 

15 Thus, the present invention provides short oligonucleotides, such as those useful as 

probes and primers. In embodiments, the probe and/or primer comprises 8 to 30 consecutive 
nucleotides of the polynucleotide according to the invention or the polynucleotide 
complementary thereto. Advantageously, a fragment as defined herein has a length of at least 
8 nucleotides, which is approximately the minimal length that has been determined to allow 

20 specific hybridization. Preferably the nucleic fragment has a length of at least 12 nucleotides 
and more preferably 20 consecutive nucleotides of any of SEQ ID NO:l or SEQ ID NO:3. 
The sequence of the oligonucleotide can be any of the many possible sequences according to 
the invention. Preferably, the sequence is selected from the following group SEQ ID No. 13, 
SEQ ID No. 14, SEQ ID N^15, SEQ ID N^16, SEQ ID N^17, SEQ ID N^'IS. More precisely, 

25 the primers pairs SEQ ID N^13/SEQ ID N^14 and SEQ ID N^15/SEQ ID N^16 are specific 
for nucleic acid fragment SEQ ID NM. The primers pair SEQ ID N^17/SEQ ID No. 18 is 
specific for nucleic acid sequence SEQ ID N**! and are flanking the nucleic acid fragment of 
SEQIDN^4. 

Thus, the polynucleotides of SEQ ID N**l and SEQ ID N*'4, and their fragments, can 
30 be used to select nucleotide primers, notably for an amplification reaction, such as tiie 
amplification reactions further described. 

PGR is described in US Patent No. 4,683,202, which is incorporated in its entirety 
herein. The amplified fragments may be identified by agarose or polyacrylamide gel 
electrophoresis, by a capillaiy electrophoresis, or alternatively by a chromatography 
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technique (gel filtration, hydrophobic chromatography, or ion exchange chromatography). 
The specificity of the amplification can be ensured by a molecular hybridization using as 
nucleic probes the polynucleotides of SEQ ID N^l or SEQ ID >P4, and their fiagments, 
oligonucleotides that are complementaiy to these polynucleotides or fragments thereof, or 
5 Aeu- amplification products themselves, or even bt DNA sequencing. 

The following other techniques related to nucleic acid amplification may also be 
used and are generally preferred to the PGR technique. The Strand Displacement 
Amplification (SDA) technique is an isothermal amplification technique based on the ability 
of a restriction enzyme to cleave one of the strands at a recognition site (which is under a 
10 hemiphosphorothioate form) and on the property of a DNA polymerase to initiate the 
synthesis of a new strand fiom the 3'OH end generated by the restriction enzyme and on the 
property of this DNA polymerase to displace the previously synthesized strand being 
localized downstream. The SDA amplification technique is more easily performed than PGR 
(a single thermostatted water bath device is necessatyX and is fastw Aan flie other 
amplification methods. Thus, the present invention also comprises using the nucleic acid 
fiagments according to the invention (primers) in a method of DNA or RNA amplification 
according to tiie SDA technique. 

When the target polynucleotide to be detected is a KNA, for example a mRNA, a 
reverse transcriptase enzyme will be used before the amplification reaction in order to obtam 
20 a cDNA from the RNA contained in the biological sample. The generated cDNA is 
subsequently used as the nucleic acid target for the primers or the probes used in an 
amplification process or a detection process according to the present invention. 

The non-labeled polynucleotides or oligonucleotides of tiie invention can be directly 
used as probes. Nevertheless, the polynucleotides or oligonucleotides are generally labeled 
25 with a radioactive element ("p, ^^s. ^h, •«!) or by a non-isotopic molecule (for example, 
biotin, acetylaminofluorene, digoxigenin, 5-bromodesoxyuridme, fluorescein) in order to 
generate probes that are useful for numerous appUcations. Examples of non-radioactive 
labeling of nucleic acid fragments are described in French patent N»-FR 78 10975 and by 
Urdea et cd. (1988, Nucleic Acids Research 11:4937-4957) or Sanchez-Pescador et td. (1988, 
30 J. Clin. Microbiol. 26(10):1934-1 938X the disclosures of which arc hereby incorporated in 
their entirety. Other labeling techniques can also be used, such as those described in French 
patents FR 2 422 956 and FR 2 518 755. The hybridization step may be performed in 
different ways. See, for example, Matthews et al., 1988, Aned. Biochem. 169:1-25. A general 
method comprises immobilizing the nucleic acid that has been extracted from the biological 
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sample on a substrate (for example, nitiocellulose, nylon, polystyrene) and then incubating, 
in defined conditions, the target nucleic acid with the probe. Subsequent to the hybridization 
step, the excess aniount of the specific probe is discarded and the hybrid molecules formed 
are detected by an appropriate method (radioactivity, fluorescence or enzyme activity 
measurement, etc.). 

Amplified nucleotide fragments are useful, among other things, as probes used in 
hybridization inactions in order to detect the presence of one polynucleotide according to the 
present invention or in o«ler to detect mutations. The primers may also be used as 
oligonucleotide probes to specifically detect a polynucleotide according to the invention. 

The oligonucleotide probes according to the present mvention may also be used in a 
detection device comprising a matrbc library of probes immobilized on a substrate, the 
sequence of each probe of a given lengfli being localized in a shift of one or several bases, 
one from tiie other, each probe of the matrix library thus being complementary to a distmct 
sequence of the target nucleic acid. OptionaUy, the substrate of the matrix may be a material 
able to act as an electron donor, the detection of the matrix positions in which an 
hybridization has occurred being subsequently determined by an electronic device. Such 
matrix hT,raries of probes and methods of specific detection of a target nucleic acid is 
described in the European patent application N» EP-0 713 016 (Affymax technologies) and 
also in the US patent US-5,202,231 (Drmanac). Since almost the whole lengdi of a 
mycobacterial chromosome is covered WBAC-based genomic DNA library (i.e. 97% of the M 
tubercuhsis chromosome is covered by the BAG library 1-1 945). these DNA Ubraries will play 
an important role m a plumlrty of post-genomic applications, such as in mycobacterial gene 
expression studies where the canonical set of BAGS could be used asamatrixfor hybridization 

studies. Thus it is also in the scope of the uivention to provide a nucleic acid chips, more 
precisely a DNA chips or a protein chips that respectively comprises a nucleic acid or a 

polypeptide of the invention. 

The present invention is also providing a vector comprising the isolated DNA 
molecule of the invention. A "vector" is a replicon in which another polynucleotide 
segment is attached, so as to bring the replication and/or expression to ti»e attached segment. 
A vector can have one or more restriction endonuclease recognition sites at which tiie DNA 
sequences can be cut in a determinable fashion without loss of an essential biological 
function of the vector, and mto which a DNA fragment can be spliced in order to bring about 
its replication and cloning. Vectors can further provide primer sites (e.g. for PGR), 
transcriptional and/or translational initiation and/or regulation sites, recombinational signals. 
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rcplicons, selectable markers, etc. Beside the use of homologous recombination or restriction 
enzymes to insert a desired DNA fragment into the vector, UDG cloning of PGR fragments 
(US Pat No. 5,334,575), T:A cloning, and the like can also be applied. The cloning vector 
can further contain a selectable marker suitable for use in the identification of cells 
transformed with the cloning vector. 

The vector can be any useful vector known to the ordinary artisan, including, but not 
limited to, a cloning vector, an insertion vector, or an expression vector. Examples of vectors 
include plasmids, phages, cosmids, phagemid, yeast artificial chromosome (YAC), bacterial 
artificial chromosome (BAG), human artificial chromosome (HAG), viral vector, such as 
adenoviral vector, retroviral vector, and other DNA sequences which are able to replicate or 
to be replicated in vitro or in a host cell, or to convey a desired DNA segment to a desired 
location within a host cell. According to a preferred embodiment of the invention, the 
recombinant vector is a BAG pBeloBAGl 1 in which the genomic region of Mycobacterium 
boviS'BCG 1173P3 that spans the region corresponding to the locus 1,760,753 bp to 
1,830,364 bp in (he genome of M tuberculosis H37Rv has been inserted into the Hindm 
restriction site; this recombinant vector is named X229. In this region, the inventors have 
demonstrated the deletion of a 2153 bp fragment in the vast majority of M tuberculosis 
strain. That's the reason why the inventors named this region TbDl ("M tuberculosis 
specific deletion I"), This 2153 bp region is flanked by the sequence GGC CTG GTC AAA 
CGC GGC TGG ATG CTG and AGA TCC GTC TTT GAG ACG ATC GAC G. External 
primers hybridizing with such sequences or the complentaiy sequences thereof can be used 
for tibie amplification of TbDl region to check for the presence or the absence of the deletion 
of the TbDl region. The inventors design for example the following primers: 

5'- CTA CCT CAT CTT CCG GTC CA-3' (SEQ ID N** 17) 

5'- CAT AGA TCC CGG ACA TGG TG-3'(SEQ ID 18) 
In order to get a specific 500 pb probe for hybridization experiments, a PGR amplification of 
an internal fragment may be realized by using the plasmid X229 as a matrix. The 
amplification of a firagment of approxhnativeiy 500 bp internal to the~TbDl region can be 
performed by using the following primers: 

5'- GOT TCA ACG CCA AAC AGG TA-3' (SEQ ID 13) 

5'- AAT GGA ACT CGT GGA ACA CC-3' (SEQ ID VP 14) 
The amplification of a fragment of approximatively 2,000 bp internal to the TbDl region can 
be performed by using the following primers: 

5*- ATT GAG CGT CTA TGG GTT GG-3^ (SEQ ID 1 5) 
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5'- AGC AGO TCG GGA TAT CGT AG-3' (SEQ ID N° 16) 
The PGR conditions are the following: denaturation 95°C 1 min, then 35 cycles of 
amplification PS'C during 30 seconds, 58'>G during 1 min] , then elongation 72X dunng 4 

min. - . , . . 

Thus, this invention also concerns a recombinant cell host which contams a 

polynucleotide or recombinant vector acconling to the invention. The cell host can be 
transformed or tmnsfected with a polynucleotide or recombinant vector to provide transient, 
stable or controlled expression of the desired polynucleotide. For example, the 
polynucleotide of interest can be subdoned into an expression plasmid at a cionmg sue 
downsti^m. ftom a pmmoter in the plasmid and the plasmid can be introduced into a host 
cell where expression can occur. The recombinant host cell can be any suitable host known 
to ^ skilled artisan, such as a eukaryotic cell or a microorganism. For example, the host can 
be a cell selected fh>m the group consisting of £.c/,.ricWa co«. 5^^^^^ 
and yeasts. According to apreferred embodiment of the invention, the recombinant cell host 
is a commercial^, available Escherichia coli DHIOB (Gibco) containing the BAG named 
X229 previously described. This Escherichia coli DHIOB (Gibco) containing the BAG 
named X229 has been deposited with the Gollection Nationale de Gultures de 
Microorganismes (CNGM), Institut Pasteur, Paris. France, on February 18 , 2002 under 

number CNCM 1-2799. 

Another aspect of the invention is the isolated or purified polypeptides encoded by a 
polynucleotide of the invention. The purified polypeptide comprises an amino acid sequence 
that is encoded by SEQ ID 6. 8. N» 10. Noi2, and their firagments thereof. For 
example, the purified polypeptide of the invention can comprise the amino acid sequence of 
SEQ ID N«>6. which is the amino acid sequence of the mmpL6 protein or the ammo acid 
sequence of SEQ ID N-8 a truncated form of mmpL6. The purified polypeptide of the 
invention can comprise the amino acid sequence of SEQ ID NO: 1 0. which is the amino acid 
sequence of the mmpS6 pn>tein or the amino acid sequence of SEQ ID N«>12 a truncated 
formofmmpS6. 

It is now easy to produce proteins in large amounts by genetic engmeenng 
techniques through Ae use of expression vectors, such as plasmids. phages, and phagem.ds. 
Hie polypeptide of the present invention can be produced by insertion of the appropriate 
polynucleotide into an appropriate expression vector at the appropriate position withm ^e 
vector. Such manipulation of polynucleotides is well known and widely practiced by the 
o«linaiy artisan. The polypeptide can be produced from these recombinant vectors either m 
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vitro or in vivo. All Ae isolated or purified nucleic acid encoding by the polypeptide of the 
invention are in the scope of the invention. The polypeptide of the invention is a polypeptide 
encoded by a polynucleotide which hybridizes to any of SEQ ID N'l or N''4 under stringent 
conditions, as defined herein. 

More preferably, said isolated or purified nucleic acid according the invention is selected 
among: 

- SEQ ID N* 5 encoding the pofcrpeptide of SEQ ID N°6; 

- SEQ ID N*» 7 encoding tide polypeptide of SEQ ID N°8; 

- SEQ ID N" 9 encodmg the polypeptide of SEQ ID N^IO; 

- SEQ ID 11 encoding ttie polypeptide of SEQ ID N'*12. 

The present inv^tion also provides a metiiod for the discriminatory detection and 
identification of: 

- Mycohacterium tuberculosis excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene KatG and/or excepted Mycobacterium 
tuberculosis strain having no or very few IS61 10 sequences inserted in their genome; versus, 

- Mycobacterium qfricanum. Mycobacterium canettii, Mycobacterium microti. 
Mycobacterium bavis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 
production of a cDNA from the RNA of the biological sample, 

b) detection of flie nucleic acid sequences of the mycohacterium present in said 
biological sample, 

c) analysis for the presence or the absence of a nucleic acid firagment as previously 
described. 

By a biological sample according to the present invention, it is notably intended a biological 
fluid, such as sputum, saliva, plasma, blood, urine or speim, or a tissue, such as a biopsy. 

Analysis of the desired sequences may, for example, be carried out by agarose gel 
electrophoresis. If the presence of a DNA Augment migrating to the expected site is 
observed, it can be concluded that the analyzed sample contained mycobacterial DNA. This 
analysis can also be carried out by the molecular hybridization technique using a nucleic 
probe. This probe will be advantageously labeled with a nonradioactive (cold probe) or 
radioactive element. Advantageously, the detection of tiie mycobacterial DNA sequences 
will be carried out using nucleotide sequences complementaiy to said DNA sequences. By 
way of example, fliey may include labeled or nonlabeled nucleotide probes; they may also 
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include primers for amplification. The amplification technique used may be PGR but also 
other alternative techniques such as the SDA (Strand Displacement Amplification) 
technique, the TAS technique (Transcription-based Amplification System), the NASBA 
(Nucleic Acid Sequence Based Amplification) technique or the TMA (Transcription 
5 Mediated Amplification) technique. 

The primers in accordance with the invration have a nucleotide sequence chosen from 
the group comprising SEQ ID No. 13, SEQ ID No. 1, SEQ ID N**1S, SEQ ID N**16, SEQ ID 
N^17, SEQ ID N°18. The pau-s SEQIDN^13/SEOIDN^14 and 
SEQIDN°15/SEQIDN^16 specific for nucleic acid fragment SEQ ID N% and the pahr 
10 SEQ ID N**17/SEQ ID N**l 8 specific for nucleic acid of the invention. 

In a variant, the subject of the invention is also a method for the discriminatory 
detection and identification of: 

" Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene katG and/or excepted Mycobacterium 
15 tuberculosis strain having no or very few IS61 10 sequences inserted in their genome; versus, 
- Afycobacterium qfricanum, ' Afycobacterium canettii, Mycobacterium microti, 
Nfycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 
comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one 
20' pair of primers as defined in claim 1 1 or 12, the DNA contained in the sample having been, 

where appropriate, made accessible to the hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 

The amplified fragments may be identified by agarose or polyaciylamide gel 
25 electrophoresis by capillary electrophoresis or by a chromatographic technique (gel filtration, 
hydrophobic chromatography or ion-exchange chromatography). The specification of the 
amplification may be controlled by molecular hybridization using probes, plasmids 
containing these sequences or their product of amplification. The -amplified nucleotide 
fragments may be used as reagent in hybridization reactions in order to detect the presence, 
30 in a biological sample, of a target nucleic acid having sequences complementary to those of 
said amplified nucleotide fragments. These probes and amplicons may be labeled or 
otherwise with radioactive elements or with nonradioactive molecules such as enzymes or 
fluorescent elements. 
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The subject of the present invention is also a kit for the discriminatoiy detection and 
identification of: 

- Mycobacteriim tuberculosis excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene katO and/or excepted Afycobacterium 

5 tuberculosis strain having no or very few IS61 10 sequences inserted in their genome; versus^ 

- Mycobacterium africamm, hfycobacterium canettii, Mycobacterium microti, 
Mycobacterium bovis, Nfycobacterium bovis BCG m a biological sample, 

in a biological sample comprising the following elements: 

a) at least one pair of primers as defined previously, 

10 b) the reagents necessary to cany out a DNA amplification reaction, 

c) optionally, the necessaxy components which make it possible to verify or 
compare the sequence and/or tiie size of the amplified fragment. 

Indeed, in the context of the present invention, depending on the pair of primers 
used, it is possible to obtain very different results. Thus, the use of primers which are internal 

15 to the deletion, such as for example SEQ ID N^13, SEQ ID N^'M, SEQ ID N^15, SEQ ID 
N^16, is such that no amplification product is detectable in M tuberculosis and that 
amplification product is detectable in Mycobacterium cfiicanum, Mycobacterium amettii, 
Mycobacterium microti, Nfycobacterium bovis, Mycobacteriim bovis BCG, Mycobacterium 
tuberculosis having the mutation CTG -> CGG at codon 463 of gene katG and/or having no 

20 or very few IS61 10 sequences inserted in their genome. However, the use of primers external 
to the region of deletion does not necessarily give the same result, as regards for example tfie 
size of the amplified fi'agment, depending on the size of the deleted region in 
M tuberculosis. Thus, the use of the pair of primers external to the deletion such as SEQ ID 
N** 17 and SEQ ID N** 18 is likely to give rise to an amplicon in Mycobacterium c^icantmt, 

25 Mycobacterium canettii, Mycobacterium microti, Mycobacterium bovis, Mycobacterium 
bovis BCG, Afycobacterium tuberculosis having the mutation CTG •> CGG at codon 463 of 
gene katG and/or having no or very few IS61 1 0 sequences inserted in their genome, of about 
2100 bp whereas the use of the pah- of primers external to the deletion will give rise in 
M tuberculosis to an amplicon of about few bp. 

30 More generally, the invention pertains to the use of at least one pair of primers as 

defined previously for the amplification of a DNA sequence from Mycobacterium 
tuberculosis or Mycobacterium qfricammu Mycobacterium canettii, Mycobacterium microti, 
Mycobacterium bovis, Mycobacterium bovis BCG, Mycobacterium Tuberculosis having the 
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mutation CTG -> CGG at codon 463 of gene katO and/or having no or veiy few IS61 10 
sequences inserted in their genome. 

The subject of the invention is also the product of expression of all or part of the 
nucleic acid fragment as defined previously and deleted from the genome of Mycobacterium 

5 tuberculosis and present in lafycobacterium qfricanwn, Mycobacterium canettii, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCGy Afycobacterium 
Tuberculosis having the mutation CTG -> CGG at codon 463 of gene katG and/or having no 
or verv few IS61 10 sequences inserted in their genome; or conversely. The expression 
**product of expression** is understood to mean any protein, polypeptide or polypeptide 

10 fragment resulting from the expression of all or part of the above-mentioned nucleotide 
sequences* Among those product of expression, one can cite the membrane proteins mmpL6, 
mmpS6, and their truncated or rearranged form due to the deletion of the fragment of the 
invention. 

Indeed, the subject of the present invention is also a method for the discriminatory 
15 detection in vitro of antibodies directed against Mycobacterium tuberculosis or 
Mycobacterium qfricanum, Mycobacterium canettii, Mycobacterium microti, Mycobacterium 
bovis, Mycobacterium bovis BCGy Mycobacterium tuberculosis having tiie mutation CTG -> 
CGG at codon 463 of gene KatG and/or having no or veiy few IS61 10 sequences inserted in 
their genome, in a biological sample, comprising the following steps: 
20 a) bringing the biological sample into contact with at least one product as 

previously defined , 

b) detecting the antigen-antibody complex formed. 

The subject of the present invention is also a method for the discriminatory detection 
of a vaccination with Kfycobacterium bovis BCG or an infection by 
25 Mycobacterium tuberculosis, excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene katG and/or excepted Mycobacterium 
tuberculosis strain having no or very few IS6110 sequences inserted in their genome in a 
mammal, comprising the following steps: 

a) preparation of a biological sample containing cells, more particularly cells of 
30 the immune system of said mammal and more particularly T cells, 

b) incubation of the biological sample of step a) with at least one product as 
previously defined , 



15 



c) detection of a cellular reaction indicating prior sensitization of the mammal to 
said product, in particular cell proliferation and/or synthesis of proteins such as gamma- 
interferon. Cell proliferation may be measured, for example, by incorporating ^-TTiymidine. 

The invention also relates to a kit for the in vitro dia^osis of an Nfycobacterinm 
tuberculosis infection, excepted infection with Mycobacterium tuberculosis strain having 
the mutation CTG -> CGG at codon 463 of gene katG and/or having no or very few IS6110 
sequences inserted in their genome, in a mammal optionally vaccinated beforehand with 
M bovis BCG comprising: 

a) a product as previously defined , 

b) where appropriate, tihie reagents for the constitution of the medium suitable 
for the immunological reaction, 

c) the reagents allowmg the detection of the anti^n-antibody complexes 
produced by the inununological reaction, 

d) where appropriate, a reference biological sample (negative control) free of 
antibodies recognized by said product, 

e) where appropriate, a reference biological sample (positive control) containing a 
predetermined quantity of antibodies recognized by said product 

The reagents allowing the detection of the antigen-antibody complexes may carry a marker 
or may be capable of being recognized in turn by a labeled reagent, more particularly in the 
case where the antibody used is not Jabeled. 

The subject of the invention is also mono- or polyclonal antibodies, their chimeric 
fragments or antibodies, capable of specifically recognizing a product of expression in 
accordance with the present invention. 

The present invention therefore also relates to a method for the discriminatory 
detection of the presence of an antigen of Mycobacterium tuberculosis or A^cobacterium 
qfricanum, Mycobact^ium canettii, Mycobacterium microti, Mycobacterium bovis, 
Mycobacterium tovfe-BCG, Mycobacterium Tuberculosis having the mutation CTG -> 
CGG at codon 463 of gene katG and/or having no or very few IS61 lO^equences inserted in 
their genome in a biological sample comprising the following steps: 

a) bringing the biological sample into contact with an antibody of the invention, 

b) detecting the antigen-antibody complex formed. 

The invention also relates to the kit for the discriminatory detection of the presence 
of an antigen of Afycobacterium tuberculosis or Mycobacterium qfricanum, Mycobacterium 
canettii, Mtycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG, 
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Mycobacterium tuberculosis having the mutation CTG CGG at codon 463 of gene katO 
and/or having no or veiy few IS6110 sequences inserted in their genome, in a biological 
sample comprising the following steps: 

a) an antibocfy as previously claimed , 

b) the reagents for constituting the medium suitable for the immunological reaction, 

c) the reagents allowing the detection of tfie antigen-antibody complexes produced 
by the immunological reaction. 

The above-mentioned reag ents are well Icnown to a pergoti <M\\&A in the art whn wtlL 

have no diSlculty adapting them to the context of the present invention. 

The subject of the invention is also an immunological composition^ characterized in 
that it comprises at least one product of e3q>ression in accordance with the invention. 

Advantageously, the immunological composition in accordance with the invention 
enters into the composition of a vaccine when it is provided in combination with a 
pharmaceutically acceptable vehicle and optionally with one or more immunity adjuvant(s) 
such as alum or a representative of &e family of muramylpeptides or incomplete Freund's 
adjuvant 

The invention also relates to a vaccine comprising at least one product of expression 
in accordance with tiie invention in combination with a pharmaceutically compatible vehicle 
and, where appropriate, one or more appropriate immunity adjuvant(s). 

The invention also provide an in vitro method for the detection and identification of 
Mycobacterium tuberculosis excepted Afycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene katG and/or excepted Afycobacterium 
tuberculosis strain havmg no or very few IS6110 sequences inserted in their genome in a 
biological sample, 
comprising the following steps: 

a) isolation of the DNA fix>m the biological sample to be analyzed or 
production of a cDNA from the KNA of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 
biological sample, 

c) analysis for the presence or the absence of a nucleic acid fragment of the 
invention. 

In another embodiment, the invention provides an in vitro method for the detection 
and identification of Mycobacterium tuberculosis excepted Mycobacterium tuberculosis 
strain having the mutation CTG -> CGG at codon 463 of gene katG and/or excepted 
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Mycobacteritm tuberculosis strain having no or very few IS61 10 sequences inserted in their 
genome in a biological sample, comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one pair of 
primers selected among nucleic acid fragments of the invention^ and more preferably 

5 selected among the primers chosen from the group comprising SEQ ID N**13, SEQ ID T>i^l4^ 
SEQ ID N^15, SEQ ID N°16, SEQ ID N^17, SEQ ID N^18, the DNA contained in the 
sample having been, where appropriate, made accessible to the hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 

10 The invention also provides a kit for the detection and identification of 

Nfycobacterhan tuberculosis excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gme katG and/or excepted Mycobacterium 
tuberculosis strain having no or very few IS6110 sequences inserted in their genome in a 
biolo^cal sample, comprising the following elements: 

15 a) at least one pair of primers selected among nucleic acid fragments of the 

invention, and more preferably selected among the primers chosen from the group 
comprising SEQ IDN*»13, SEQ ID N^14, SEQ ID N^15, SEQ ID N^16, SEQ ID N*>17, SEQ 
IDN^18, 

b) the reagents necessary to cany out a DNA amplification reaction^ 
20 , c) optionally, the necessary components which make it possible . to verify or 
compare die sequence and/or the size of the amplified fingment 

The invention also relates to a method for the detection in vitro of antibodies 
directed against Afycobacterium tuberculosis excepted Mycobacterium Tuberculosis having 
the mutation CTG -> CGO at codon 463 of gene KatG and/or having no or very few IS61 10 
25 sequences inserted in their genome, in a biological sample, comprising the following steps: 

a) bringing the biological sample into contact with at least one product as defined 
previously, 

b) detecting the antigen-antibody complex formed . 

It is also a goal of the invention to use the TbDl deletion as a as a genetic marker for 
30 the differentiation of Mycobacterium strain of Mycobacterium tuberculosis complex. 

It is also a goal of the invention to use mmpL6^^^ polymorphism as a genetic marker 
(see SEQ ID N**20) for the differentiation of Mycobacterium strain of Mycobacterium 
tuberculosis complex. 
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The use of such genetic marker(s) in association with at least one genetic markers 
selected among RDl, RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, RDIO, Rdl 1, RD13, 
RD14, RvDl, RvD2, RvD3, RvD4, RvD5, katG^^^ gyrA'^ oxyR'^*^, pncA^^ allows the 
differentiation of Mycobacterium strain of Mycobacterium Tuberculosis complex (see 
example 4). 

The present invention provides an in vitro method for the detection and identification 
of Mycobacteria fix)m the Mycobacterium Tuberculosis complex in a biological sample^ 
■comprising-t he followi ng-steps: ■ 

a) analysis for the presence or the absence of a nucleic acid fragment of the sequence 
TbDUand 

b) analysis of at least one additional genetic marker selected among RDl, RD2» RD3, 

RD4, RD5, RD6, RD7, RD8, RD9, RDIO, Rdll, RD13, RD14, RvDl, RvD2, 
RvD3, RvD4, RvD5, katG^^, gyrA^^ oxyR'^*^, pncA^. 
In a preferred embodiment, two additional markers are used, preferably RD4 and RD9. 
The analysis is performed by a technique selected among sequence hybridization, nucleic 
acid amplification, antigen-antibody complex. 

It is also a goal of the present invention to provide a kit for tihie detection and 
identification of Mycobacteria from the Mycobacterium Tuberculosis complex in a 
biological sample comprising the following elements: 

. ... a) . . at least one pair of primers selected among nucleic acid fragments of the 
invention, and more preferably selected among the primers chosen from the 
group comprising SEQIDN^13, SEQIDN<>14, SEQ ID N'^IS, SEQ ID 
N^16, SEQ ID N**17, SEQ ID N^18, 
b) at least one pair of primers specific of the genetic markers selected among 
RDl, RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, RDIO, Rdll, RD13, 
RDM, RvDl, RvD2, RvD3, RvD4, RvDS, katG^^, gyrA*^, oxyR'^*^ pncA^^ 

C) the reagents necessary to carry out a DNA amplification reaction, 

d) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 
In a preferred embodiment, the kit comprises the following elements: 

a) at least one pair of primers selected among nucleic acid fragments of the 
invention, and more preferably selected among the primers chosen from the 
group comprising SEQ IDN^13, SEQ IDN*>14, SEQ ID N**15, SEQ ID N^'ie, 
SEQ ID N^17, SEQ ID 1 8, 
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b) one pair of primers specific of the genetic markers RD4, 

C) one pair of primers specific of the genetic markers RD9, 

d) the reagents necessary to cany out a DNA amplification reaction, 

e} optionally^ the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment 

The figures and examples presented below are provided as fiirther guide to the 
practitioner of ordinary skill in the art and are not to be construed as limiting the invention in 
anyway. 

FIGURES 

F^re 1 : Amplicons obtained firom strains that have the indicated genomic region present 
or deleted. Sizes of amplicons in each group are uniform. Numbers correspond to strain 
designation used in Kremer et al. (1999, J. Clin Microbiol. 37: 2607-2618) (Ref. 8) and 
Supply et al (2001, J. Clin. Microbiol. 39: 3563-3571) (ref.9). 

Figure. 2: Sequences in the TbDl region obtained from strains of various geographic 
regions. 

* refers to groups based on kcaCf^^^lgyrA^^ sequence polymorphism defined by Sreevatsan 
and colleagues (Ref. 2). Numbers correspond to strain designation used in Kremer et al. 
(1999, J. Clin Microbiol. 37: 2607-2618) (Ref. 8) and Supply et al (2001, J. Clin. Microbiol. 
39: 3563-3571) (ref 9). 

Figure 3: Spoligotypes of selected M tuberculosis and M bovis strains. Numbers 
correspond to strain designation used in Kremer et al. (1999, J. Clin "Microbiol. 37: 2607- 
2618) (Ref. 8) and Supply et al (2001, J. Clin. Microbiol. 39: 3563-3571) (ref 9). 

Figure 4: Scheme of the proposed evolutionary pathway of the tubercle bacilli illustrating 
successive loss of DNA in certain lineages (grey boxes) The scheme is based on presence or 
absence of conserved deleted regions and on sequence polymorphisms in five selected genes. 
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Note that the distances between certain branches may not correspond to actual phylogenetic 
differences calculated by other methods. 

Blue arrows indicate that strams are characterized by kaiG^^ CTG (Leu), gyrA"^^ ACC 
(Thr), typical for group 1 organisms. Green arrows indicate that strains belong to group 2 
5 characterized by kaiCr^^^ CGG (Arg), gyrA"^^ ACC (Thr). The red arrow indicates that 
strains belong to group 3, charcterized by katG^ CGG (Arg), gyrA"^^ AGC (Ser), as 
defined by Sreevatsan and colleagues (Sreevastan et al., 1997 Proc. Natl. Acad.Sci USA 
1S1:9869-9874URef,2\ 

10 EXAMPLES 

1. MATERIAL AND METHODS: 

1.1. Bacterial Strains ; The 100 M tuberculosis complex strains comprised 46 M 
15 tuberculosis strains isolated in 30 countries, 14 M africanum strains, 28 M bovis strains 

originating in 5 countries, 2 M bovis BCG vaccine strains (Pasteur and Japan), 5 M microti 
strains, and 5 M canettii strains. The strains were isolated from human and animal sources 
and were selected to represent a wide diversity including 60 strains that have been used in a 
multi-center study (8). The M africanum strains were retrieved from the collection of the 

20 Wadsworth Center, New York State Department of Health, Albany^ New York, whereas the 
majority of the M bovis isolates came from the collection of the University of Zaragoza, 
Spain. Four M canettii strains are from the culture collection of the Institut Pasteur, Paris, 
France. The strains have been extensively characterized by reference typing methods, i.e. 
1S6110 restriction fragment length polymorphism (RFLP) typing and spoligotyping. M 

25 tuberculosis H37Rv, M tuberculosis H37Ra, M tuberculosis CDC1551, M bovis 
AF2122/97, M microti OV254, and M canettii CIPT 140010059 were included as reference 
strains. DNA was prepared as previously described (10). _ 

1.2. Genome comparisons and primer design 

30 For preliminary genome comparisons between M tuberculosis and M bovis websites 
http://penolist.pasteur.fr/TubercuList/ and http://www.sanger.ac,uk /Proiects/M bovis/ as 
well as inhouse databases were used. For primer design, sequences inside or flanking RD 
and RvD regions were obtained from the same websites. Primers were designed using the 
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primer 3 website http://wwAV-genome,wi,mit.edu/cgi--bin/primer/primer3 www.cgi that 
would amplify ca. 500 base pair fragments in the reference strains (Table 1). 

1,3, RD-PCR analysis 

5 Reactions were performed in 96 well plates and contained per reaction 1 .25 jil of 10 x PGR 
buffer (600mM Tris HCl pH 8.8, 20 mM MgCl2. 170 mM (NH4)2S04, 100 mM p- 
mercaptoethanol), 1.25 }x\ 20mM nucleotide mix, 50 nM of each primer, 1-10 ng of template 
DNA, 10% DMSO, 0,2 units Tag polymerase (Gibco-BRL) and sterile distilled water to 12.5 
jLil. Thermal cycling was performed on a PTC-100 amplifier (MJ Inc.) with an initial 
10 denaturation step of 90 seconds at 95^C, followed by 35 cycles of 30 seconds at 95°C, 1 min 
at 58*^0, and 4 min at 72^C. 

lA. Sequencing of junction regions (RDs^ TbDl.) kaiG, svrA. oxvR and pncA genes 
PGR products were obtained as described above, using primers listed in Table 1 . 

15 For primer elimination, 6 jil PGR product was incubated with 1 unit of Shrimp 

Alkaline phosphatase (USB), 10 units of exonuclease I (USB), and 2 ^il of 5 x buffer 
(200mM Tris HCl pH 8.8, 5mM MgCb) for 15 min at 37^G and then for 15 min at 80''C. To 
this reaction mixture 2 ^tl of Big Dye sequencing mix (Applied Biosystems), 2 [xl (2nM) of 
primer and 3 |uil of 5 x buffer (5mM MgGlz, 200mM Tris HGl pH 8.8) were added and 35 

20 cycles (96'='C for 30 sec; 56''C for 15 sec; 60^*0 for 4 min) performed in a thermocycler (MJ- 
research Inc., Watertown, MA). DNA was precipitated using 80 |xl of 76% ethanol, 
centrifuged, rinsed with 70% ethanol, and dried. Reactions were dissolved in 2 jiil of 
formamide/EDTA buffer, denatured and loaded onto 48 cm, 4 % polyacrylamide gels and 
electrophoresis performed on 377 automated DNA sequencers (Applied Biosystems) for 10 

25 to 12 h. Alternatively, reactions were dissolved in 0.3 mM EDTA buffer and subjected to 
automated sequencing on a 3700 DNA sequencer (Applied Biosystems), Reactions generally 
gave between 500-700 bp of unambiguous sequence. _ 

1,5* Accession Numbers 
30 The sequence of the TbDl region from the ancestral M tuberculosis strain No. 74 

(Ref. 8) containing genes mmpS6 and mmpL6 was deposited in the EMBL database under 
accession No. AJ426486. Sequences bordering RD4, RD7, RD8, RD9 and RDIO in BCG are 
available under accession numbers AJ003103, AJ007301, AJ13I210, Y18604, and 
AJ 1 32559, respectively. 
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2. EXPERIMENTAL DATA: 

The distribution of 20 variable regions resulting from insertion-deletion events in liie 
genomes of the tubercle bacilli has been evaluated in a total of 100 strains of Mycobacterium 

5 tuberczdosis, M. qfricanumyM canettih M microti and M bovis. This approach showed that 
the majority of these polymorphisms did not occur independently in the different strains of 
the M tuberculosis complex but, ratiier, result from ancient, irreversible genetic events in 
cnrnmon progenitor strains. Ba sed on the presence or absence of an M tuberc^om SPecifio 
deletion (TbDl), M tuberculosis strains can be divided into ancestral and '•modem** strains, 

10 the latter comprising representatives of major epidemics like the Beijing, Haarlem and 
African M tuberculosis clusters. Furthermore, successive loss of DNA, reflected by RD9 
and other subsequent deletions, was identified for an evolutionary lineage represented by M 
qfricanum, M microti and M bovis that diverged fcom the progenitor of the present M 
tuberculosis strains before TbDl occurred. These findings contradict the often-presented 

15 hypothesis that M tuberculosis, the etiological agent of human tuberculosis evolved fix>m M 
bovis^ the agent of bovine disease. M canettii and ancestral M tuberculosis strains lack 
none of tiiese deleted regions and therefore appear to be direct descendants of tubercle bacilli 
that existed before the M qfriccmum-^ M bovis lineage separated from the M tuberculosis 
lineage. This suggests that the conunon ancestor of tiie tubercle bacilli resembled M 

20 tuberculosis or M canettii and could well have been a human pathogen already. . . 

The mycobacteria grouped in the M tuberculosis complex are characterized by 
99.9% similarity at the nucleotide level and identical 16S rRNA sequences (1, 2) but differ 
widely in terms of then: host tropisms, phenotypes and pathogenicity. Assuming that they are 
all derived from a common ancestor, it is intrigumg that some are exclusive human (M 

25 tuberculosis, M cf/ricanwn, M canettii) or rodent pathogens (M microti) whereas others 
have a wide host spectrum (M bovis). What was the genetic organization of the last common 
ancestor of the tubercle bacilli and in which host did it live? Which genetic events may have 
contributed to the fact that the host spectrum is so different and ofteiTSpecific? Where and 
when did M tuberculosis evolve? Answers to these questions are important for a better 

30 understanding of the pathogenicity and the global epidemiology of tuberculosis and may 
help to anticipate future trends in the spread of the disease. 

Because of the unusually high degree of conservation in their housekeeping genes it 
has been suggested tliat tlie members of the M tuberculosis complex underwent an 
evolutionary bottleneck at the time of speciation, estimated to have occurred roughly 15,000 
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- 20,000 years ago (2). It also has been speculated that M tuberculosis^ the most widespread 
etiological agent of human tuberculosis has evolved from M bovis^ the agent of bovine 
tuberculosis, by specific adaptation of an animal pathogen to the human host (3). However, 
both hypotheses were proposed before the whole genome sequence of M tuberculosis (4) 
5 was available and before comparative genomics uncovered several variable genomic regions 
in the members of the M tuberculosis complex. Differential hybridization arrays identified 
14 regions (RDl -14) ranging in size from 2 to 12,7 kb that were absent jBrom BCG Pasteur 
relative to M tuberculosis H37Rv (5, 6), In parallel, six regions, RvDl-5» and TbDl, that 
were absent from the M tuberculosis H37Rv genome relative to other members of the M 
10 tuberculosis complex were revealed by comparative genomics approaches employing 
pulsed-field gel electrophoresis (PFGE) techniques (S, 7) and in silico comparisons of the 
near complete M bovis AF2 122/97 genome sequence and fte M tuberculosis H37Rv 
sequence. 

In the present study the inventors have analyzed the distribution of these 20 variable 
15 regions situated around the genome (Table 1) in a representative and diverse set of 100 
strains belonging to the M tuberculosis complex. The strains were isolated from different 
hosts, fcom a broad range of geographic origins, and exhibit a wide spectrum of typing 
characteristics like IS6J10 and spoligotype hybridization patterns or variable-number tandem 
repeats of mycobacterial interspersed repetitive units (MIRU-VNTR) (8, 9). The inventors 
20 have found.striking evidence that deletion of certain variable genomic regions did not occur 
independently in the different strains of the A£ tuberculosis complex and, assuming that 
there is little or no recombination of chromosomal segments between the various lineages of 
the complex, this allows the inventors to propose a completely new scenario for the 
evolution of the M tuberculosis complex and the origin of human tuberculosis. 

25 

Variable genomic regions and their occurrence in the members of the M tuberculosis 
complex. 

The PGR screening assay for the 20 variable regions (Table 1) within 46 M. 
30 tuberculosis^ 14 M qfricanum^ 5 M canettii^ 5 M microti, 28 M. bovis and 2 BCG strains 
employed oligonucleotides internal to known RDs and RvDs, as well as oligonucleotides 
flanking diese regions (Table 1). This approach generated a large data set that was robust, 
higlily reliable, and internally controlled since PGR amplicons obtained with the internal 



24 



primer pair correlated willi the absence of an appropriately sized amplicon with the flanking 
primer-pair, and vice-^versa. 

According to the conservation of junction sequences flanking the variable regions 
three types of regions were distinguished, each having different importance as an 

5 evolutionary marker. The first type included mobile genetic elements, like the prophages 
phiRvl (RD3) and phiRv2 (RDll) and insertion sequences IS7JJ2 (RD6) and IS6110 
(RD5), whose distribution in the tubercle bacilli was hi^ly divergent (Table 2). The second 
type of deletion is mediated by homologous recombination between adjacent 1S61J0 
insertion elements resulting in the loss of the intervening DNA segment (RvD2, RvD3, 

10 RvEW, and RvD5 (7)) and is variable from strain to strain (Table 2). 

Th^ third type includes deletions whose bordering genomic regions typically do not 
contain repetitive sequences. Often this ^e of deletion occurred in coding regions resulting 
in the truncation of genes that are still intact in other strams of the M tuberculosis complex^ 
The exact mechanism leading to this type of deletion remains obscure, but possibly rare 

15 strand slippage errors of DNA polymerase may have contributed to this event As shown in 
detail below, RDl, RD2, RD4, RD7, RD8, RD9, RDIO, RD12, RDI3, RDM, and TbDl are 
representatives of this third group whose distribution among Ae 100 strains allows us to 
propose an evolutionary scenario for the members of the A£ tuberculosis complex, that 
identified M. tuberculosis and/or M. canettii as most closely related to the common ancestor 

20 . of the tubercle bacilli- 



2.1. M. tuberculosis strains; 

Investigation of the 46 M tuberculosis strains by deletion analysis revealed that most 
RD regions were present in all M tuberculosis strains tested (Table 2). Onfy regions RD3 

25 and RDll, corresponding to the two prophages phiRvl and phiRv2 of M tuberculosis 
H37RV (4), RD6 containing the insertion sequence IS/532, and RD5 that is flanked by a 
copy of \S61 10 (5) were absent in some strams. This is an important observation as it implies 
that M tuberculosis strains are highly conserved with respect to RDl, RD2, RIM, RD7, 
RD8, RD9, RDIO, RD12, RD13, and RD14, and that these RDs represent regions that can 

30 differentiate M tuberculosis strains independent of their geographical origm and their 
^ing characteristics from certain other members of the M tuberculosis complex. 
Furtiiermore, this suggests that these regions may be involved in tiie host specificity of M 
tuberculosis. 

In contrast, the presence or absence of RvD regions in M tuberculosis strains was 
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variable. The region which showed the greatest variability was RvD2, since 18 from 46 
tested M tuberculosis strains did not carry the RvD2 region. Strains with a high copy 
number of IS6 J 10 (>14) missed regions RvD2 to RvD5 more often than strains with only a 
few copies. As an example, all six tested strains belonging to the Beijing cluster (8) lacked 

5 regions RvD2 and RvD3. This is in agreement with the proposed involvement of 
recombination of two adjacent copies of IS61 10 in this deletion event (7). 

However, the most surprising finding concerning the RvD regions was that TbDl was 
absent from 40 of the tested M tuberculosis strains (87 %), including representative strains 
from major epidemics such as the Haarlem, Beijing and Africa clusters (8). To accentuate 

10 this result we named this region "M tuberculosis specific deletion 1" (TbDl). In silico 
sequence comparison of M tuberculosis H37Rv with Ae corresponding section in M bovis 
AF2 122/97 revealed tbat in M bovis this locus comprises two genes encoding membrane 
proteins belonging to a large family, whereas in M tuberculosis H37Rv one of these genes 
(mmpS6) was absent and the second was truncated immpL6). Unlike the RvD2-RvD5 

15 deletions, the TbDl region is not flanked by a copy of IS6110 in M tuberculosis H37Rv, 
suggesting that insertion elements were not involved in the deletion of the 2153 bp fragment. 
To further investigate \\iietfaer the 40 M tuberculosis strains lacking the TbDl region had 
the same genomic organization of this locus as M tuberculosis H37Rv, we amplified the 
TbDl-junction regions of the various strains by PGR using primers flanking the deleted 

20 region (Table 1). This approach showed Aat the size of the amplicons obtained, from 
multiple strains was uniform (Fig. 1) and subsequent sequence analysis of the PGR products 
revealed that in all tested TbDl -deleted strains the sequence of the junction regions was 
identical to that of M tuberculosis H37Rv (Fig.2). The perfect conservation of the junction 
sequences in TbDl-deleted strains of wide geographical diversity suggests that the genetic 

25 event which resulted in the deletion occurred in a common progenitor. However, sfac M 
tuberculosis strains^ all characterized by veiy few or no copies of IS61 10 and spoligotypes 
that resembled each other (Fig. 3) still had the TbDl region present. Interestingly, these six 
strains were also clustered together by MIRU-VNTR analysis (9). 

Analysis of partial gene sequences of oxyJR, pncA, katG^ and gyrA which have been 

30 described as variable between different tubercle bacilli (2, 1 1, 1 2, 13) revealed that all tested 
M tuberculosis strains showed oxyR and pncA partial sequences typical for M tuberculosis 
(pxyR - nucleotide 285 ipxyI^^)iG, pncA - codon 57 (pwc^": CAC ). Based on the katG 
codon 463 (AatfG**^) and gyrA codon 95 {gyrA^^) sequence polymorphism, Sreevatsan and 
colleagues (2) deflned three groups among the tubercle bacilli, group 1 showing katGf^^ 



163 



26 



CTG (Leu), gyrA^' ACC (Thr), group 2 exhibiting kat(f^^ CGG (Arg), gyrA^^ ACC (Thr), 
and group 3 showing te/G^^^ CGG (Arg), gyrA^^ AGC (Ser). According to this scheme, in 
our study 16 of the 46 tested M tuberculosis strains belonged to group 1, whereas 27 strains 
belonged to group 2 and only 3 isolates to group 3, From the 40 strains that were deleted for 
region TbDl, 9 showed characteristics of group 1, including the strains belonging to the 
Beijing cluster, 28 of group 2, including the strains from the Haarlem and Africa clusters and 
3 of group 3, including H37Rv and H37Ra. Most interestingly, all six M tuberculosis strains 
-w here Uic TbDl r egion wastiot-deletedreen taincd a louoino (CTG )-al-feg /G^ ^ which wt 
described as characteristic for ancestral M tuberculosis strains (group 1) (2)* As shown in 
Figure 4, this suggests that during the evolution of M tuberculosis the katG mutation at 
codon 463 CTG (Leu) CGG (Arg) occurred in a progenitor strain that had region TbDl 
deleted. This proposal is supported by the finding that strains belonging to group 1 may or 
may not have deleted region TbDl, whereas all 30 strains belonging to groups 2 and 3 lacked 
TbDl (Fig. 4). Furthermore, all strains of groups 2 and 3 characteristically lacked spacer 
15 sequences 33-36 in the direct repeat (DR) region (Fig. 3). It appears that such spacers may be 
lost but not gained (14). Therefore, TbDl deleted strains will be referred to hereafter as 
**modem" M tuberculosis strains. 
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2.2. M. canettii: 

20 A£ canettii is a very rare smooth variant of M tuberculosis^ isolated usually from 

patients from, or with connection to, Africa. Although it shares identical 16S rKNA 
sequences with the otihier members of the M tuberculosis complex, A£ cmettii strains differ 
in many respects including polymorphisms in certain house-keeping genes, IS1081 copy 
number, colony morphology, and the lipid content of the cell wall (15, 16). Therefore, we 

25 were surprised to find that in M canettii all the RD, RvD, and TbDl regions except the 
prophages (phiRvl, pliiRv2) were present In contrast^ we identified a region (RD*^ being 
specifically absent from all five M canettii strains that partially overlapped RD12 (Fig. 4), 

The conservation of the RD, RvD, and TbDl regions in the genSme of M canettii in 
conjunction with the many described and observed differences suggest that M canettii 

30 diverged fit>m the common ancestor of the M tuberculosis complex before RD, RvD and 
TbDl occunred in fte lineages of tubercle bacilli (Fig. 4). This hypothesis is supported by the 
finding that M canettii was shown to cany 26 unique spacer sequences in the direct repeat 
region (14), that are no longer present in any other member of the M tuberculosis complex. 
Therefore, M canettii represents a fascinating tubercle bacillus, whose detailed genomic 
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analysis may reveal further insights into the evolution of the M tuberculosis complex* 

2.3. M. africanum: 

The isolates designated as M africanum studied here originate from West and East- 
African sources. 1 1 strains were isolated in Sierra Leone, "Nigeria and Guinea and 2 strains in 
Uganda. One strain comes fit>m the Netherlands. 

For the 1 1 West African isolates, RD analysis indicated that diese strains all lack the 
RD9 region contcuning cobL. Sequence analysis of the RD9 junction region showed that the 
genetic organization of this locus in West African strains was identical to that of A£ bovts 
and M microti in fliat the 5' part of cobL as well as the genes Rv2073c and Rv2074c were 
absent In addition, six strains (2 from Sierra Leone, 4 from Guinea) also lacked RD7» RD8 
and RDIO (Table 2). The junction sequences bordering RD7, RD8 and RDIO, like those for 
RE>9, were identical to those of M bovis and M microti strains. As reg^uds the two 
prophages phiRvl and phiRv2, the West African strains all contained phiRv2, whereas 
phiRvl was absent No variability was seen for the RvD regions. RvDl-RvD5 and TbDl 
were present in all tested West African stnuns. This shows that M africanum prevalent in 
West Africa can be differentiated from **modem" M tuberculosis by at least two variable 
genetic markers, namely the absence of region RD9 and the presence of region TbDl . 

In contrast, for East African A£ africanum and for tiie isolate from the Netheriands, 
no genetic marker was found which could differentiate them from M tuberculosis strains. 
With the exception of prophage phiRvl (RD3) the 3 strains from Uganda and the 
Netherlands did not exhibit any of the RD deletions, but lacked the TbDl region, as do 
•'modem" M tuberculosis strains. The absCTice of the TbDl region was also confirmed by 
sequence analysis of the TbDl junction region, which was found to be identical to that of 
TbDl deleted M tuberculosis strains. These results indicate a very close genetic relationship 
of these strams to M tuberculosis and suggest that they should be re^rded as M 
tuberculosis rather than M ctfricanum strains. 

2.4. M. microtii 

M microti strains were isolated in the 1930's from voles (17) and more recently from 
immuno-suppressed patients (18). These strains are characterized by an identical, 
characteristic spoligotype, but differ in their IS6110 profiles. Both, the vole and the human 
isolates, lacked regions RD7, RD8, RD9, and RDIO as well as a region that is specifically 
deleted from M microti (RD*"'*^. RD"**'' was revealed by a detailed comparative genomics 
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study of M microti isolates (19) using clones from a M microti Bacterial Artificial 
Chromosome (BAG) library. RET** partially overlaps RDl from BCG (data not shown). 
Furthennore, vole isolates missed part of the RD5 region, whereas this region was present in 
the human isolate. As the junction region of RD5 in M microti was different to that in BCG 
5 (data not shown), RD5 was not used as an evolutionary marker. 



2.5. M. bovis and M. bovis BCG; 

li/f hnvix hag a very lar ge host spectrum infecting many mammalian SPecies^ 

including man* The collection of M bovis strains that was screened for tihie RD and RvD 

10 regions consisted of 2 BCG strains and 18 "classical" M bovis strains generally 
characterized by only one or two copies of 186110 from bovine, llama and human sources in 
addition to three goat isolates, three seal isolates, two oryx isolates, and two M bovis strains 
from humans that presented a higher number of IS61 10 copies. 

Excluding prophages, the distribution of RDs allowed us to differentiate five main 

15 groups among the tested M bovis strains. The first group was formed by strains that lack 
RD7, RDS, RD9, and RDIO. Representatives of this group are three seal isolates and two 
human isolates containing between three and five copies of 1S61 10 (data not shown). Two 
oryx isolates harboring between 17 and 20 copies oflSSl 10 formed the second group that 
lacked parts of RDS in addition to RD7-RD10, and very closely resembled the M microti 

20 isolates. However, they did not show RD™\ the deletion characteristic of M microti strains 
(data not shown). Analysis of partial oxyR and pncA sequences from strains belonging to 
groups one and two, showed sequence polymorphisms characteristic of M tuberculosis 
strains (pxyl^^i G,pncA^\ CAC, Ref. 12, 13). 

25 Group three consists of goat isolates that lack re^ons RDS, RD7, RDS, RD9, RDl 0, 

RD12, and RD13. As previously described by Aranaz and colleagues, these strains exhibited 
an adenosine at position 28S of the oxyR pseudogene that is specific for "classical" M bovis 
strains whereas the sequence of the pnc/P polymorphism was identical to that in M 
tuberculosis (20). This is in good agreement with our results from sequence analysis (Table 

30 2) and the finding that except for RD4. the goat isolates displayed the same deletions as 
' "classicar M bovis strains. Taken together, this suggests that the oxylP^ mutation (G -> 
A) occurred in M bovis strains before RD4 was lost. Interestingly, the most common M 
bovxs strains ("classicaP M bovis (21)), isolated from cattle from Argentina, the 
Netheriands, the UK and Spain, as well as from humans (e. g. multi-drug resistant M bovis 
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from Spain (22)) showed the greatest number of RD deletions and appear to have undergone 
the greatest loss of DNA relative to other members of the M tuberculosis complex. These 
lacked regions RD4, RD5, RD6, RD7, RD8, RD9, RDIO, RD12 and RD13, confirming 
results obtained with reference strains (5, 6). These strains together with the two BCG strains 
5 were the only ones that showed the pncA^^ polymorphism GAC (Asp) in addition to the 
oxyS^ mutation (G -> A) characteristic of M bovis. Analysis of BCG strains indicate that 
BCG lacked the same RD regions as ''classical M hovis strains in addition to RDl, RD2 
and RDM which apparently occurred durmg and after the attenuation process (Fig. 4) (6, 
23). 

10 In contrast to RDs, the RvD regions were highly conserved in the M bovis strains. 

Wilh tibie exception of the two IS57iO-rich oryx isolates, that lacked RvD2, RvD3 and RvD4, 
all other strains had the five RvD regions present It is particularly noteworthy that TbDl 
was present in all M bovis strains. 

However, except for the two human isolates, containing between three and five copies 

15 of 1&6JI0 from group 1, strains designated as M bovis showed a single nucleotide 
polymorphism in the TbDl region at codon 551 (AAG) of the mmpL6 gene, relative to M 
canettiiy Mi qfricanwn and ancestral M tuberculosis strains, which are characterized by 
codon AAC. Even the strains isolated fi-om seals and from oryx with oxyR or pncA loci like 
those of M tuberculosis and with fewer deleted regions than the classical M bovis strains, 

20 showed the mmpL^^^ AAG polymorphism typical for M bovis and M microti (Table 2, Fig. 
4), As such, this polymorphism could serve as a very useful genetic marker for the 
differentiation of strains that lack RD7, RD8, RD9, and RDIO and have been classified as M 
bovis or M qfriccanm^ but may differ from other strains of the same taxon. 



25 3. DISCUSSION 



3,1. Origin of human tuberculosis 

For many years, it was thought that human tuberculosis evoh^ed fix>m the bovine 
disease by adaptation of an animal pathogen to the human host (3). This hypothesis is based 
30 on the property of M tuberculosis to be almost exclusively a human pathogen, whereas M 
bovis has a much broader host range. However, the results from this study unambiguously 
show that A/, bovis has undergone numerous deletions relative to M tuberculosis. This is 
confirmed by the preliminary analysis of the near complete genome sequence of M bovis 
AF2122/97, a "classical'* M bovis strain isolated from cattle, which revealed no new gene 
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clusters that were confined specifically to M bovis. This indicates that tfie genome of M 
bovis is smaller than that of M tuberculosis (24), It seems plausible that M bovis is the final 
member of a separate lineage represented by M qfriccmum (RD9), M microti (RD7, RD8, 
RD9, RDIO) and M bovis (RD4, RD5, RD7, RD8. RD9, RDIO, RD12, RD13) (25) that 
branched from the progenitor of M tuberculosis isolates. Successive loss of DNA may have 
contributed to clonal expansion and the appearance of more successful pathogens in certain 
new hosts. 

Whether the progenitor of extant M tuberculosis strains was already a human 

pathogen when the M africanum M bovis lineage separated from the M tuberculosis 
lineage is a subject for speculation. However, we have two reasons to believe that this was 
the case. Firstly, the six ancestral M tuberculosis strains (TbDl*, RD9*) (Fig.3) that 
resemble the last common ancestor before the separation of M tuberculosis and M 
africanum are all human pathogens. Secondly, M canettiU which probably diverged from the 
common ancestor of today's A£ tuberculosis strains prior to any other known member of the 
M tuberculosis complex is also a human pathogen. Taken together, this means that those 
tubercle bacilli, which are thought to most closely resemble the progenitor of M tuberculosis 
are human and not animal pathogens. It is also intriguing that most of these strains were of 
African or Indian origin (Fig. 3). It is likely that these ancestral strains predominantly 
originated from endemic foci (15, 26), whereas "modem" M tuberculosis strains that have 
lost TbDl may represent epidemic M tuberculosis strains that were introduced into the same 
geogr^hical regions more recently as a consequence of the worldwide spread of the 
tuberculosis epidemic. 

3,2> The evolutionarv timescale of the M tu berculosis complex 

Because of the high sequence consCTvation in housekeeping genes, Sreevatsan et al. 
previously hypotihesized that the tubercle bacilli encountered a major bottleneck 15,000 - 
20,000 years ago (2). As the conservation of Ae TbDl junction sequence in all tested TbDl 
deleted strains suggests descendance from a single clone, the TbDl "deletion is. a perfect 
indicator that "modem" M tuberculosis strains that account for the vast majority of today's 
tuberculosis cases definitely underwent such a bottleneck and then spread around the worid. 

As described in detail in the results section, our analysis showed that the katCf^^ 
CTG-^CGG and the subsequent gyrA?^ ACC -»AGC mutations, that were used by 
Sreevatsan and colleagues to designate groups 2 and 3 of their proposed evolutionary 
pathway of the tubercle bacilli (2), occurred in a lineage of M tuberculosis strains that had 
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already lost TbDl (Fig.4). Although deletions are more stable mailcers than point mutations, 
which may be subject to reversion, a perfect correlation of deletion and point mutation data 
was found for the tested strains. 

This information, together with results from a recent study by Fletcher and 

5 colleagues (27), who have shown that M tuberculosis DNAs amplified from naturally 
mummified Hungarian villagers from the 18* and 19*^ century belonged to kat(f^^/gyrA^^ 
groups 2 and 3, suggests that the TbDl deletion occurred in tihe lineage of M tuberculosis 
before the 1 8*** century. This could mean that the dramatic increase of tuberculosis cases later 
in the 1 8* century in Europe mainly involved **modem" M tuberculosis strains. In addition, 

10 it shows that tuberculosis was caused by M tuberculosis and not by M boviSy a feet which is 
also described for cases in rural medieval England (28). 

There is good evidence that mycobacterial infections occurred in man several 
thousand years ago. We know that tuberculosis occurred in Egypt during the reign of the 
phamohs because spinal and rib lesions pathognomonic of tuberculosis have been identified 

15 in mummies from that period (29). Identification of acid fast bacilli as well as PGR 
amplification of IS5/70 from Peruvian mummies (30) also suggest that tuberculosis existed 
in pre-Columbian societies of Central and South America. To estimate when the TbDl 
bottleneck occurred, it would now be very interesting to know whether the Egyptian and 
South American mummies carried M tuberculosis DNA that had TbDl deleted or not 

20 The other major bottleneck, which seems to have occurred for members of the M 

qfricanwn M microti M bovis lineage is reflected by RD9 and the subsequent RD7, 
RD8 and RDIO deletions (Fig. 4), These deletions seem to have occurred in the progenitor of 
tubercle bacilli that - today - show natural host spectra as diverse as humans in Africa, voles 
on the Orkney Isles (UK), seals m Argentina, goats in Spain, and badgers in the UK. For tiiis 

25 reason it is difficult to imagme that spread and adaptation of RD9-deleted bacteria to their 
specific hosts could have appeared witiim the postulated 15,000 - 20,000 years of speciation 
of the M tuberculosis complex. 

However, more insight into this matter could be gained by RD analysis of ancient 
DNA samples, e. g. mycobacterial DNA isolated fix>m a 1 7,000 year old bison skeleton (3 1). 

30 The mycobacterium whose DNA was amplified showed a spoligotype that was most closely 
related to patterns of A£ africanum and could have been an early representative of the 
lineage M cffricanum ->M bovis. With the TbDl and RD9 junction sequences that we 
supply here, PCR analyses of ancient DNAs should enable very focused studies to be 
undertaken to learn more about the timescale within which the members of the M 
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tuberculosis complex have evolved. 

33. Conclttding comments 

Our study provides an overview of the diversity and conservation of variable regions 
5 in a broad range of tubercle bacilli. Deletion analysis of 100 strains from various hosts and 
countries has identified some evolutionarily "old" M canettiU M tuberctdosis and A£ 
qfricanwn stmins, most of them of African origin, as well as "modem" M tuberculosis 
strains, the latter mcluding representatives from major epidemic clusters like Beijing, 
Haarlem and Africa. The use of deletion analysis in conjunction with molecular typing and 

10 analysis of specific mutations was shown to represent a very powerful approach for the study 
of the evolution of the tubercle bacilli and for the identification of evolutionary markers. In a 
more practical perspective, these regions, prfanarily RD9 and TbDl but also RDl, RD2, 
RD4, RD7, RD8, RDIO, RD12 and RD13 represent very interesting candidates for Ihe 
development of powerful diagnostic tools for the rapid and unambiguous identification of 

15 members of the A£ tuberculosis complex (32). This genetic ^roach for di£ferentiation can 
now be used to replace the often confusing traditional division of the M tuberculosis 
complex into rigidly defined subspecies. 

Moreover, functional analyses will show whether the TbDl deletion confers some 
selective advantage to "modem" A£ tuberculosis^ or whether other circunoistances contributed 

20 to the pandemic of the TbDl deleted M tuberculosis strains. 



EXAMPLE 4 

25 The members of the A£ tuberculosis complex share en unusually high degree of 

conservation such that the commercially-available nucleic acid probes and amplification 
assays cannot differentiate these organisms. In addition conventional identification methods 
are often ambiguous, cumbersome and time consuming because of the slow growth of the 
organisnos. 

30 In the present invention the inventors, by a deletion analysis^ solve the problem faced 

by clinical mycobacteriology laboratories for differentiation within the M tuberculosis 
complex. 

This approach allows to perform a diagnostic on a biological fluid by using at least 
three markers including TBDl. The following table 3 illustrates such a combinaison 
35 sufficient to realize the distinction between flie members of the M tuberculosis complex. 
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MZ^HKERS 


MYCOBACTERIUM 
STR21IN 


RD4 


RD9 


TBDl 


M. bo vis BCG 






+ 


M. bo vis 


- 


- 


+ 


M * africantim 


+ 


- 


+ 


M. tuberculosis 


+ 


+ 


- 


tuberculosis 
ancStre 


+ 




+ 


M. canettii 


+ 


+ 


+ 



Table 3 

Beside TbDl marker, preferably at least 2 other markers should be used^ Examples of such 
additional markers available in the literature are listed in the following lable 1 . 



Supplemental data; 
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Table 1: RD, RvD and TbDl regions and selected primers 



Region Gene Size Internal Flanking primers or 

absent from ^ j Primerpair ^"**'™**' * primerpair 

BCG 



RDl 



Rv3871-Rv3879c 



RD2 



Rvl978-Rvl988 



RD3* 



Rvl573-Rvl586c 



RIM 



Rvl505c-Rvl516c 



RD5* 



Rv2346c-Rv2353c 



RD6* 



Rv3425-Rv3428c 



RD7 



Rvl964-Rv!977 



RD8 



ephA'lpqG 



9 5 RD11D-RV3878F 

GTC AGC CAA GTC AGO CTA CX: 

RDlm«Rv3878R 
CAA CX3T TOT GOT TGT TGA GG 
10^3 RD2-Rvl979JntF 

TAT AGC TCT CGG CAG GTT CC 

RD2-Rvl979-mt.R 
ATC GGC ATC TAT GTC GGT GT 
9^ RD3-Rvl586.intJ^ 

TTA TCT TOO OGT TGA CGA TO 

RD3-RvlS86.intJEl 
CAT ATA AGO GTG CCX: OCT AC 
J2.7 RD4-Rv] 5 1 6.mt.F 

CAA GGG GTA TGA GGT TCA CG 

RD4-Rvl516.int.R 
CGG TG A TIC OTG ATT GAA CA 
9 Q RD5A-Rv2348.intF 

AAT CAC OCT GOT GCT ACT CC 
RD5A-Rv2348.ml.R 
GTG CTT TTG GOT CTT GGT C 

4 9 RX)64S1S32F 

CAG CTG OTG AGT TCA AAT GC 
RD6*ISIS32R 
CTCCCOACACCTGTTCOT 
12J RD7-Rvl976iiiLF 

TOO ATT GTC G AC GGT ATG AA 

RD7-Rvl976.mt.R 
G GT CGA TAA GGT CAC GO A AC 

5 9 RD8-ep!iAJ^ 

GGT GTG ATT TGG TOA G AC G AT G 



RDl-flank.Ieft 
GAA ACA GTC CCC AGC AGO T 

RDl-flank.right 
TTC AAC GGG TTA CTG CGA AT 

RD2.fIaiikJ? 
CTC GAC COC G AC OAT GTO C 

RD2-flai]kJR 
CCT CGT TGT CAC COC GTA TO 

RD3Hnt-REP.F 
CTG AGO TCG TTG TCG AGO TA» 
RD3-lQt-RBP.R 
GTA CCC CCA GGC GAT CTT* 
RD4-flank.F 
CTC GTC GAA GGC CAC TAA AG 

RD4-«fiank.R 
AAG GCO AAC AGATTC AGC AT 

RD5B-plGA.inlJ? 
CAA OTT GGG TCT GGT CGA AT 

RDSB-pIcA.int.R 
OCT ACC CAA GGT CTC CTG GT 

ND 
ND 

RD7-flaiik,F 
GGT AAT CGTGGC CGA CAA O 

RD7-flatik.R 
CAG CTC TTC CCC TCT CGA C 

RD8.flaiik.F 
CAA TCA GGG CTG TGC TAA CC 
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RD9 



RDIO 



RDll 



RDi2 



RD13 



RDM 



coAZ..Rv2075 



Rv0221-Rv0223 



5^cC-Rv3121 



2.0 



1.9 



Rv2645-Rv2695c 11.0 



2.8 



Rvl255c-Rvl257c 3.0 



Rvl765c-Rv 1773c 9.0 



Region missing from M tuberculosis H37Rv 
RvDl* 5.0 



RvD2* 



pIcD 



5.1 



RvD3 



1.0 



RD8-ephA,R 
ACT TCC TCC TOA CTA ATC CAG GC 

RD9-intF 
CO A TGG TCA ACA CCA Cr A CO 

RD9-iistR 
CTO GAC CTC OAT GAC CAC TC 

RDlO-intF 
OTA ACC GOT TCA CCD OAA T 

RDlO-iiUR 
GTC AAC TCC ACG GAA AGA CC 
RDl]-RV264fiF 
CGO CAG CTA 0 AC OAC CTC 
K011-Rv264$R 
AACOTGCTG CGATAGGTTTT 

RD12-Rv3120.int.F 
GAA ATA CGA GTG CGC TGA CC 
RD12-Rv3120.iiitR 
CTC TGA ACC ATC GOT GTC G 

RDlSistF 
GGA TOT CAC TCG GAA CGG CA 

RD13inlR 
CAC CGG GCT GAT CGA GCG A 

RD14-RvI769.nil.F 
GTG GAG CAC err GAC CTG AT 

RD14.Rvl769.hilJl 
COT CO A ATA CGA GTC GAA CA 

RvDl-intlF 
AGC GCG TCG AAC ACC GGC 
RvDi-lntlR 
CCTGAATCC OCO CAA TTC CAT 

RvI>2-intlF 
GTT CTC CTG TCG AAC CTC CA 

RvD2-int1R 
ACT TCA CCG GTT TCA TCT CO 

RvD3-intF 
ATC GAT CAG GTC GTC AAT GC 
RvDa-intR 
ACQ CCA CCA TCA AGA TCC 



RD8-flatik.R 
CGA CAG TTG TG C GTA CTG 01 

RD9-flankF 
GTG TAG GTC AGC COC ATCC 

RD9-flankR 
GCC CAACAG CTC GAC ATC 

RDlO-fiankF 
CTG CAA CCA TCC GGT ACA C 

RDlO-ilankR 
GTC ATG AAC GGC GGA CAG 

RDlI-fla-F 
TCA CAT AGG GGC TOC GAT AG 

RDIl-fla-R 
AGA GGA ACC TTf CGG TOG TT 

RDl2-flank.F 
GCC ATC AAC GTC AAG AAC CT 

RD12-flaiik.R 
CGO CCA GGT AAC AAG GAG T 

RDlB-ilaokJ? 
CGA TGG TGT TTC TTG GTG AO 

RDl3-flank.R 
GGA TCG GCT CAO TGA ATA CC 

RDI4-flBiikF 
TTG ATT CGC CAA CAA CTO AA 

RDH-flankR 
GOO CTG GTT ACT GTC GAT TC 

RvDl-im2.F 
GAG CCA CTC CGA TGT TGA CT 

RvDl-int2.R 
CAC GCG AAC OCT ACC T AC AT 

RvD2-iiil2F 
GGA CGG TGA CGG TAT TTG TC 

RvD2*mt2R 
TCG CCA ACT TCT ATG GAC CT 

RvD3-flank.F 
AAA CCA TGC AGC GTC TGC CA 

RvD3-flankR 
GCG TTT CTG CGT CTXS GTT OA 



36 



RvD4* 



PPEgene 



0.8 



RvD5 



moa 



4.0 



■TbD-l- 



mmpL6. 



-2J_ 



RvD4HntF-PPE 
GGT TGC CAA COT TAG CO A TGC 
R\i>4-inlR-PPE 
CCG GTG GTG GTG GCG GCT 
RvDStntF 

OGO TTC ACG TTC ATT ACT GTT C 
RvDSintR 
CX:T GOG CTT ATC TCT AGC GG 

TOPlintSJF 



ND 
ND 

RvD5-flankF 
CCC ATC GTG OTC GTT C AC C 

RvDS-flankR 
GTA CCC GCA CCA CCT GCT G 

TBDIflal-F 



CGT TCA AOC CCA AAC AGG TA 

TBDlintS.R 
AAT CGA ACT CGT GGA ACA CC 



CTA CCT CAT CTT CCG GTC CA 

TBDIflal-R 
CAT AG A TCC CGG ACA TGG TG 



katG, gyrAi oxyR\pncA and mmpL6 PGR and sequencing primers 



AorC-2154,872-^EQ-R 
ACA AGC TGA TCC ACC GAG AC 



gyrA 



95 



oxyR 



pncA 
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itfl/G-2154,225-PCR-F 
CTA CCA GCA CCG TCA TCT CA 

AalG-2155,157-PCR-R 
AGG TOG TAT GGA CO AACA CC 

0)ri4-7,127-PCR-F 
GTT CGT GTG TTG CGT CAA GT 

gyr4-8,312-PCR-R 
C AG CTO GGT GTG CIT OTA AA 

- TAT GOG ATC AGG CGT ACT TO 

i»CKR*2726,Q24-PCR-R 
CAA AGC AGT GGT TCA GCA GT 

p/ic4-2288,678-PCR-F 
ATC AGG AGC TGC AAA CCA AC 

/OTc4- 2289,3 19-FCR-R 
OGC GTC ATG GAC OCT ATA TC 

fln7ip£rse(|SF 
GTA TCA GAG GGA CCG AGC AO 

TBDlflnl-R 
CAT AGA TCC CGG ACA TGG TO 

The RD nomenclature used in this table is based on that used by Brosch et al. (2000), (Ref. 25) and 
difiers from that proposed by Behr and coworicers (1 999), (Re£ 6). Primer sequences are shown in 5' 
-^3' direction. 

* Regions where a second pair of internal primers was used rather than flanking primers, due to 
5 flanking repetitive regions, and/or mobile genetic elements. 



mmpL6 



:ssi 



SyrA-7,46l¥ 
CGG GTG ore TAT GCA ATG TT 



CAA AGC AGT GGT TCA GCA GT 



pneA' 2289^ 19-SEQ-R 
GGC GTC ATG GAC CCT ATA TC 



GTA TCA GAG GGA CCG AGC AG 
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CLAIMS 

1. An isolated or purified nucleic acid wherein said nucleic acid is selected from the 
group consisting of: 

a. SEQIDN**!; 

b. Nucleic acid having a sequence fully complementary to SEQ ID N^l . 

c. Nucleic acid fragment comprising at least IS consecutive nucleotides of 
SEQIDN^l; 

d. Nucleic acid having at least 90% sequence identity after optimal 
alignment with a sequence defined in a) or b); 

6. Nucleic acid tiiat hybridizes under stringent conditions with the nucleic 
acid defined in a) or b); 

2. A nucleic acid fragment according to claim 1 wherein said nucleic acid fragment is: 

- specifically deleted from the genome of Mycobacteritm tuberculosis, mcepted in 
Mycobacterium tuberculosis strain having the mutation CTG -> CGG at codon 463 of gene 
katO and/or Mycobacterium tuberculosis strain having no or very few IS6110 sequences 
inserted in their genome; and« 

- present in the genome of Mycobacterium cffricanum, Nfycobacterium canetti, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG, 

3. A nucleic acid fragment according to claim 1 or 2 selected from the group consisting 
of: 

a) SEQIDN°4; 

b) Nucleic acid having a sequence fully complementary to SEQ ID N*4. 

c) Nucleic acid fi*agment comprising at least 20 consecutive nucleotides of SEQ ID 
N'^4; 

d) Nucleic acid having at least 90% sequence identity after optimal alignment with a 
sequence defined in a) or b); 

e) Nucleic acid that hybridizes under stringent conditions with the nucleic acid defined 
in a) or b); 

4. The isolated or purified nucleic acid of claim 1 wherein said nucleic acid comprises at 
least a deletion of a nucleic acid fragment according to claim 2 or 3. 
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5. Isolated or purified polypeptides encoded by the nucleic acid of claims 1 to 4. 

5 6. The polypeptide of claim 5 selected among polypeptide with sequence SEQ ID N° 5, 
j^o 7^ 9^ 1^ and iheir fragments thereof. 

7> Isolated or purified nucleic acid encoding the polypeptide according to claim 6, 

10 8. Isolated or purified nucleic acid according to claim 7 selected among: 

- SEQ ID 4 encoding the polypeptide of SEQ ID N^5; 

- SEQ ID N** 6 encoding the polypeptide of SEQ ID N^7; 

- SEQ ID 8 encoding the polypeptide of SEQ ID N^'g; 

- SEQ ID 10 encoding the polypeptide of SEQ ID N**! 1; 
15 and their fragments thereof. 

9. A recombinant vector comprising a nucleic acids sequence selected among nucleic 
acids according to claims 1 to 4 , 7 and 8. 

20 . . 10. The recombinant vector of claim 9 consisting of vector, named X229 introduced into 
the recombinant Escherichia coli deposited with the CNCM on February 1 8*, 2002 under N** 
1-2799. 

11. A recombinant cell comprising a nucleic acids sequence selected among nucleic acids 
25 accoiding to claims 1 to 4 , 7 and 8, or a vector according to claims 9 and 10. 

12. The recombinant cell according to claim 11 consisting of the Escherichia coti 
deposited wifli the CNCM on Februaiy 1 8*, 2002 under 1-2799. " 

30 13. A method for the discriminatory detection and identification of: 

- Mycobacteriwn tuberculosis excepted Mycobacterium tuberculosis strain having the 
mutation CTG -> CGG at codon 463 of gene katO and/or excepted Mycobacterium 
tuberculosis strain having no or very few IS61 10 sequences inserted in their genome; versus. 
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- Mycobacterium afrtcanum, Mycobacterium canetti, Mycobacterium microti, 
Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 
production of a cDNA from the RNA of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 
biological sample, 

c) analysis for the presence or the absence of a nucleic acid fragment according 
to claims 2 or 3. 

14. The method as claimed in claim 13, in which the detection of the mycobacterial DNA 
sequences is carried out usmg nucleotide sequences complementaxy to said DNA sequences. 

15. Tlie method as claimed in claim 13 or 14, in which the detection of the mycobacterial 
DNA sequences is carried out by amplification of these sequences using primers. 

16. The method as claimed in claim IS, in which the primers have a nucleotide sequence 
chosen from the group comprising SEQIDN^13, SEQIDN^14, SEQ ID N**15, SEQ ID 
N^16, SEQ ID N*^17, SEQ ID N^l 8. 

17. A method for the discriminatory detection and identification of: 

- Mycobacterium tuberculosis excepted Mycobacterium tuberctdosis strain having the 
mutation CTG -> CGG at codon 463 of gene katG and/or excepted Mycobacterium 
tuberculosis strain having no or veiy few IS61 1 0 sequences inserted in their genome; versus, 

- Mycobacterium qfricmum, Mycobacteritmi canetti, Kfycobacteritm^ microti, 
Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) bringing die biological sample to be analyzed into contact with at least one 
pair of primers as defined in claim 15 or 16, the DNA contained in the sample having been, 
where appropriate, made accessible to the hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 

18. A kit for the discriminator/ detection and identification of: 
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- Mycobacterium tuberculosis excepted Mycobacterium tvberctdosis strain having the 
mutation CTG -> CGG at codon 463 of gene katG and/or excepted Mycobacterium 
tuberculosis strain having no or very few IS61 10 sequences inserted in their genome; versus, 

- Afycobacterium africammt, Afycobacterium canetti, Mycobacierium microti, 
5 Afycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

in a biological sample comprising the following elements: 

a) at least one pair of primers as defined in claim 15 or 16, 

b^ the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
10 compare the sequence and/or tfie size of the amplified fragment 

19. The use of at least one pair of primers as defined in claim 15 or 16 for llie 
amplification of a DNA sequence fix>m Afycobacterium tuberculosis or Afycobacterium 
ajricamim, Mycobacteritm canetti, Afycobacterium microti, Afycobacterium bovis, 

15 Afycobacterium bovis BCG, Afycobacterium tuberculosis having ttie mutation CTG -> CGG 
at codon 463 of gene katG and/or having no or very few IS61 10 sequences inserted in their 
genome. 

20. A product of expression of all or part of the nucleic acid fragment as defined in 
claim 2 or 3 deleted fi-om the genome of Mycobacterium tuberculosis .,md present , in 
Afycobacterium africanum, Afycobacterium canetti, Afycobacterium microti, Afycobacterium 
bovis, Afycobacterium bovis BCG^ Afycobacterium tuberculosis having the mutation CTG -> 
CGG at codon 463 of gene katG and/or having no or very few IS61 10 sequences uiserted in 
their genome; or conversely. 

21. A method for the discriminatory detection in vitro of antibodies directed against 
Afycobacterium tuberculosis or Afycobacterium cfiicanum, Afycobacterium canetti, 
Afycobacterium microti, Mycobacterium bovis, Afycobacterium bovis SCG, Mycobacterium 
Tuberculosis having the mutation CTG -> CGG at codon 463 of gene KatG and/or having no 
or veiy few IS6110 sequences inserted in their genome, in a biological sample, comprising 
the following steps: 

a) bringing the biological sample into contact with at least one product as defined 
in claim 20, 

b) detecting tiie antigen-antibody complex formed. 



20 
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22. A method for the discriminatory detection of a vaccination with Mycobacterium 
bovis BCG or an infection by Mycobacterium tuberculosis, excepted Mycobacterium 
Tuberculosis strain having the mutation CTG -> CGG at codon 463 of gene KatG and/or 
excepted Mycobacterium Tuberculosis strain having no or very few IS6110 sequences 
inserted in their genome in a mammal^ comprising the following steps: 

a) preparation of a biological sample containing cells, more particularly cells of 
the immune system of said mammal and more particularly T cells, 

b) incubation of the biological sample of step a) with at least one product as 
defined in claim 20, 

c) detection of a cellular reaction mdicating prior sensitization of the mammal to 
said product, in particular cell proliferation and/or synthesis of proteins such as gamma- 
interferon. 

23. A kit for tibie in vitro diagnosis of Mycobacterium tuberculosis infection, excepted 
infection with Mycobacterium tuberculosis strain having the mutation CTG -> CGG at 
codon 463 of gene katG and/or having no or very few IS6110 sequences inserted in their 
genome, in a mammal optionally vaccinated beforehand with M bovis BCG comprismg: 

a) a product as defined in claim 20, 

b) . where appropriate, the reagents for ihp constitution of the medium suitable 
for the immunological reaction, 

c) the reagents allowing the detection of the antigen-antibody complexes 
produced by the immunological reaction, 

d) where appropriate, a reference biological sample (negative control) free of 
antibodies recognized by said product, 

e) where appropriate, a reference biological sample (positive control) containing a 
predetermined quantity of antibodies recognized by said product. 

24. A mono- or polyclonal antibody, its chimeric fragments or antibodies, characterized 
in that they are capable of specifically recognizing a product as defined in claim 20, 

25. A method for the discriminatory detection of the presence of an antigen of 
Mycobacterium tuberculosis or Mycobacterium qfricanum, Mycobacterium canetti, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG, Mycobacterium 
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tuberculosis having the mutation CTG CGG at codon 463 of gene katO and/or having no 
or very few IS6 1 1 0 sequences inserted in their genome in a biological sample comprising the 
following steps: 

a) bringing the biological sample into contact with an antibody as claimed in 
5 claim 24, 

b) detecting the antigen-antibody complex formed. 

9f>. A kit for the dificritninatoiy detect i o n of the presence o f an antiRen of 
Afycobacterium tuberculosis or Mycobacterium ajricanvm, Mycobacteritm canetti, 

10 Mycobacterium microti, Afycobacterium bovis, Mycobacterium bovis BCG, Mycobacterium 
tuberculosis havmg the mutation CTG -> CGG at codon 463 of gene katG and/or having no 
or very few IS6110 sequences inserted in their genome, in a biological sample comprising 
tihie following steps: 

a) an antibody as claimed in claim 24, 

15 b) the reagents for constituting the medium suitable for the immunological 

reaction, 

c) the reagents allowing the detection of flie antigen-antibody complexes produced 
by the inununological reaction. 

20 27. An immunological composition, characterized in that it comprises at least one 
product as defined in claim 20. 

28- A vaccine, characterized in that it comprises at least one product as defined in claim 
20 in combination with a pharmaceutical^ compatible vehicle and, where appropriate, one 
25 or more appropriate immunity adjuvants. 

29. An in vitro method for the detection and identification of Mycobacterium tuberculosis 
excepted Mycobacterium tuberculosis strain having the mutation CTGn=> CGG at codon 463 
of gene katG and/or excepted Afycobacterium tuberculosis strain having no or very few 
30 IS61 1 0 sequences inserted in their genome in a biological sample, 
comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 
production of a cDNA from the RNA of the biological sample, 
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b) detection of the nucleic acid sequences of the mycobacterium present in said 
biological sample, 

d) analysis for the presence or the absence of a nucleic acid fragment according to 
claims 2 or 3. 

30. An in vitro method for the detection and identification of Mycobacterium 
tuberculosis excepted Mycobacterium tuberculosis strain having the mutation CTG -> CGO 
at codon 463 of gene katG and/or excepted Afycobacterizm tuberculosis strain having no or 
very few IS61 10 sequences inserted in fheir genome in a biological sample, 

comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one pair of 
primers selected among nucleic acid fragments according to claims 1 to 4, 7 and 8, and more 
preferably selected among the primers chosen from the group comprising SEQIDN®13» 
SEQIDN^M, SEQ ID N*^15. SEQ ED N^16, SEQ ID N**17, SEQ ID N**18, the DNA 
contained in the sample having been, where appropriate, made accessible to the 
hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments, 

31. A kit for the detection and identification of Mycobacterium tuberculosis excepted 
Mycobacterium tuberculosis strain having the mutation CTG -> CGG at codon 463 of gene 
katG and/or excepted Mycobacterium tuberculosis strain having no or very few IS6110 
sequences inserted m their genome in a biological sample, comprising the following 
elements: 

a) at least one pair of primers selected among nucleic acid fragments according 
to claims 1 to 4, 7 and 8, and more preferably selected among the primers chosen from the 
group comprising SEQIDN^13, SEQIDN**14, SEQ ID N'^IS, SEQ ID N^16, SEQ ID 
N^17,SEQIDN°18, - 

b) the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 

32. A method for the detection in vitro of antibodies directed against Mycobacterium 
tuberculosis excepted Afycobacterium Tuberculosis having the mutation CTG -> CGG at 
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codon 463 of gene KatG and/or having no or very few IS61 10 sequences inserted in their 
genome, in a biological sample, comprising the following steps: 

a) bringing the biological sample into contact with at least one product as defined 
in claim 20, 

5 b) detecting the antigen-antibody complex formed- 

33, Use of TbDl deletion as a as a genetic maricer for the differentiation of Mycobacterium 
strain of Mycobacterium Tuberculosis complex, 

10 34. Use of mmpL6^^" polymorphism as a genetic marker for the differentiation of 
Mycobacterium strain of Mycobacterium Tuberculosis complex. 

35, Use of tihe geneic marker according to claims 33 and/or 34 in association with at least 
one genetic maricers selected among RDl, RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, 

15 RDIO, Rdl 1. RD13, RDM, RvDl, RvD2. RvD3, RvD4, RvD5, katG^^^ gyrA'^ oxyR'^'^ 
pncA^^ for the difFerentiation of Mycobacterium strain of Mycobacterium Tuberculosis 
complex. 

36. An in vitro method for the detection and identification of Mycobacteria from the 
20 hfycobacterium Tuberculosis complex in a biological sample, comprising the following 
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c) analysis for the presence or the absence of a nucleic acid fragment of the sequence 
TbDl according to claims 2 or 3, and 

d) analysis of at least one additional genetic marker selected among RDl, RD2, RD3, 
25 RD4, RDS, RD6, RD7, RD8, RD9, RDIO, Rdll, RD13, RD14, RvDl, RvD2, 

RvD3, RVD4, RvD5, katG'''^, gy^A*^ oxyR'^*^^ pncA^^. 

37. The in vitro method of claim 36 wherem two additional markers* are used, preferably 
RD4 and RD9. 



38. The method according to claim 36 wherein the analysis is performed by a technique 
selected among sequence hybridization, nucleic acid amplification, antigen-antibody 
complex. 
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39. A kit for the detection and identification of Mycobacteria from the Mycobacterium 
Tuberculosis complex in a biological sample comprising the following elements: 

e) at least one pair of primers selected among nucleic acid fragments according 
to claims 1 to 4, 7 and 8, and more preferably selected among the primers 
chosen from the group comprising SEQIDN'^13, SEQIDN^M, SEQ ID 
N^15, SEQ ID N^16, SEQ ID N^17, SEQ ID N^l 8, 

f) at least one pair of primers specific of the genetic markers selected among 
RDl, RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, RDIO, Rdll, RD13, 
RD14, RvDU RvD2, RvD3, RvD4, RvD5, katG*^\ gyrA^^ oxyR'^^ pncA^^ 

g) the reagents necessary to carry out a DNA amplification reaction, 

h) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment 



40. A kit according to claim 39 comprising the following elements: 

f) at least one pair of primers selected among nucleic acid fragments according to 
claims 1 to 4, 7 and 8, and more preferably selected among the primers chosen 
from the group comprising SEQ ID N°13, SEQ ID N^14, SEQ ID N^15, SEQ ID 
N°16, SEQ ID N^17, SEQ ID N°l 8, 

g) one pair of primers specific of the genetic markers RD4, 

h) one pair of primers specific of the genetic markers RD9, 

i) the reagents necessary to cany out a DNA amplification reaction, 

j) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 



PATENT APPLICATION 



TITLE; DELETED SEQUENCE IN AT. TUBERCULOSIS, METHOD FOR 
5 DETECTING MYCOBACTERIA USING THESE SEQUENCES AND 

VACCINES 

APPLICANTS: INSTTTUT PASTEUR 

10 



ABSTRACT: 

The present invention is the identification of a nucleotide sequence which make it 
possible in particular to distinguish an infection resulting from Mycobacteriim tuberculosis 
from an infection resulting from Mycobacterium crfricanum, Mycobacterium canetti, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG. The subject of 
the present invention is also a method for detecting the sequences in question by the products 
of expression of these sequences and the kits for cartying out these methods. Finally, the 
subject of the present invention is novel vaccines. 
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SEQUENCE LISTING 

<110> Institut PASTEUR 

<X20> DELETED SEQUENCE IN M. TUBERCULOSIS, METHOD FOR 
DETECTING MYCOBACTERIA USING THESE SEQUENCES AND 
VACCINES 

<130> D20110 

<160> 20 

<170> Patentin Ver, 2,1 

<210> 1 
<211> 3953 
<212> DNA 

<213> Mycobacterium tuberculosis 

<220> 
<221> CDS 

<222> (735) . . (3638) 
<400> 1 



tccagcgcgg 


ccatcagcga 


tgaactctgg 


gacctgctac 


ccggctacct 


catcttccgg 


60 


tccatcatcc 


ccaaccggcc 


gcccacccag 


gacacggtgc 


aagccctcgt 


cgacgacgtg 


120 


atactcccca 


gcctcacccg 


atccaccggt 


tgagtcagcg 


gtgcgaatgg 


ctgggcaccg 


180 


ttgtggtgtc 


cggtcccgta 


ccgtactgtt 


gaatccgcgg 


atccccgcct 


gaggtacggg 


240 


gcgtggtcgc 


gccccgggca 


atagcgtcgc 


cggttatcga 


aaggctaacg 


ggtgcagggg 


300 


atttcagtga 


ctggcctggt 


caaacgcggc 


tggatggtgc 


tggttgccgt 


ggcggtggtg 


360 


gcggtcgcgg 


gattcagcgt 


ctatcggttg 


cacggcatct 


tcggctcgca 


cgacaccacc 


420 


tcgaccgccg 


gtggtgtcgc 


gaacgacatc 


aagccgttca 


accccaaaca 


ggtaaccctc 


480 


gaggtctttg 


gcgctcccgg 


aaccgtggca 


acgatcaatt 


atctggacgt 


ggatgccaca 


540 


cctcggcaag 


tcctggacac 


gaccctgccg 


tggtcataca 


cgatcacgac 


gaccctgccc 


600 


gcggtcttcg 


ccaatgttgt 


cgcgcaaggc 


gacagcaatt 


ccatcggctg 


ccgcatcacc 


660 


gtcaacggtg 


tagtcaagga 


cgaaaggatc 


gtcaacgaag 


tgcgcgccta 


taccttctgc 


720 


ctcgacaagt 


cctc atg age aac cac cac cgc ccg 
Met Ser Asn His His Arg Pro 
1 5 


egg act tgg ttg ccg 
Arg Pro Trp Leu Pro- 
10 
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cac acc ate cga egg ett teg ttg ccg ate ttg ctg ttt tgg gtg ggt 
His Thr lie Arg Arg Leu Ser Leu Pro lie Leu Leu Phe Trp Val Gly 
15 20 25 



gtg gee gee ata acc aat gcc gcc gtg ccg caa ttg gag gtg gtc ggg 

Val Ala Ala He Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly 
30 35 40 

gag gcg cat aac gtc gca cag age tec ccg gat gac ccg teg ctg cag 
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Glu Ala His Asn Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin 



45 



50 55 60 



gcg atg aaa cgc ate ggc aag gtg ttc cac gag ttc gat tec gac agt 
Ala Met Lys Arg lie Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser 
65 70 75 



962 



gcg gee atg ate gtc ttg gaa gge gat aag ceg etc ggc aac gac gee 1010 
Ala Ala Met He Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala 
80 85 90 

cac egg ttc tac gac ace ctg etc cgc aac ctt tea aac gac ace aaa 1058 
His Arg Phe Tyr Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys 
i 95 100 105 

eac gtc gag cac gtt cag gac ttc tgg ggc gat ccg ctg ace gcg gee 1106 

His Val Glu His Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala 

110 115 120 

gge teg caa age ace gac ggc aaa gcc gee tac gtt cag gtc tat etc 1154 
Gly Ser Gin Ser Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu 
125 130 135 140 

gcc ggc aac caa ggc gag gcg ttg tea ate gag tec gtc gac gcg gtg 1202 
Ala Gly Asn Gin Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val 
145 150 155 

cgc gac ate gtc gee cat acg eca cca ccg gcc ggg gtc aag gee tac 1250 
Arg Asp He Val Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr 
160 165 170 

gtc acc ggc gcg gcc ccg etc atg gcc gat cag ttt cag gtg ggc age 1298 
Val Thr Gly Ala Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser 
175 180 185 

aaa gga ace gcg aaa gtt ace ggg ata act ctg gtt gtg ate gcg gtg 1346 
Lys Gly Thr Ala Lys Val Thr Gly lie Thr Leu Val Val He Ala Val 
190 195 200 

atg ttg etc ttc gta tac cgt tec gtc gtc acc atg gtc ctg gtg ctt 1394 
Met Leu Leu Phe Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu 
205 210 215 220 

ate acg gtt ctt att gag ttg gee gcg gee cgc ggg ate gtc get ttt 
He Thr Val Leu He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe 

230 235 



225 



240 



245 250 



255 



1442 



etc gga aac gee ggg gta ate ggg ctg teg aea tac teg acg aat ctg 1490 
Leu Gly Asn Ala Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu 



etc aea eta ttg gta ate gcg gcg gge aea gac tac gcg att ttt gtc 1538 
Leu Thr Leu Leu Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val 

260 265 



etc gge cgc tat cac gag gcg cgc tac gcc gca cag gat egg gaa acg 1586 
Leu Gly Arg Tyr His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr 
270 275 280 

gcc ttc tac acg atg tat cgc ggg acc gcc cac gtc gtc ttg gge teg 1634 
Ala Phe Tyr Thr Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser 
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285 290 295 300 

ggt ctg acc gtt gcc ggc gcg gtg tat tgc ctg age ttt acc egg eta 1682 
Gly Leu Thr Val Ala Gly Ala Val Tyr Cye Leu Ser Phe Thr Arg Leu 
305 310 315 

ccc tat ttt caa age ctg ggt att ccc gcc teg ata ggg gtg atg att 1730 
Pro Tyr Phe Gin Ser Leu Gly lie Pro Ala Ser He Gly Val Met He 
320 325 330 

gcg ttg gca gcc gcg etc age ctg gee cca tec gtg etc ate ttg ggc 1778 
Ala Leu Ala Ala Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly 
335 340 345 

< 

agt cgt ttc ggt tgt ttc gaa ccc aag cge agg atg agg acc agg gga 1826 
Ser Arg Phe Gly Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly 
350 355 360 

tgg egg cge ate ggc acg gcc ate gtg egt tgg ccg gga ccc ate ctg 1874 
Trp Arg Arg He Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu 
365 370 375 380 

gca gtg gcg tgc gca att gcg gtg gtg ggt ctg etc gcg ctg ccg gga 1922 
Ala Val Ala Cys Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly 
385 390 395 

tac aaa acg age tac gac get cge tat tac atg ccc gcc acc gcc ccg 1970 
Tyr Lys Thr Ser Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro 
400 405 410 

gcc aat att ggc tac atg gcc gcg gag cga cat ttt ccc caa gcg egg 2018 
Ala Asn He Gly Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg 
415 420 425 

ctg aat ccc gaa eta ctg atg ate gag acg gat cae gat atg cge aat 20 66 
Leu Asn Pro Glu Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn. 
430 435 440 

ccg gcc gac atg etc ate ttg gat agg ate gcc aag get gtc ttc eat 2114 
Pro Ala Asp Met Leu He Leu Asp Arg He Ala Lys Ala Val Phe His 
445 450 455 460 

Ctg ccc ggc ata ggg ctg gtg cag gee atg acc egg ccg eta gga acc 2162 
Leu Pro Gly He Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr 
465 470 475 

ccg att gac cac age teg ata ccg ttt cag ate age atg caa age gtc 2210 
Pro He Asp His Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val 
480 485 490 

ggc cag att cag aat etc aag tat cag agg gac cga gca gcc gac ttg 2258 
Gly Gin He Gin Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu 
495 500 505 

ctg aag cag gcc gaa gag ctg ggg aag acg ate gaa ate ttg cag cge 23 06 
Leu Lys Gin Ala Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg 
510 515 520 

caa tat gcc eta cag cag gaa etc gcg gcc get act cac gag caa gee 2354 
Gin Tyr Ala Leu Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala 
525 530 535 540 
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gaa age ttt cac caa acg ate gcc acg gta aac gaa ctg cga gat agg 
Glu Ser Phe His Gin Thr He Ala Thr Val Asn Glu Leu Arg Asp Arg 
545 550 555 



2402 



ate gcc aat ttc gac gat ttc ttc agg ccg att cgt agt tac ttt tac 2450 

He Ala Asn Phe Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr 

560 565 570 

tgg gaa aag cac tgc tac gat ate ccg age tgc tgg gcg ctg aga tec 2498 

Trp Glu Lys His Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser 
575 580 585 

gtlc ttt gac acg ate gac ggt ate gac caa etc gge gag cag ctg gcc 2546 
Val Phe Asp Thr He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala 
590 595 600 



age gtg ace gta ace ttg gac aag ttg get gcg ate cag cet caa ttg 2594 
Ser Val Thr Val Thr Leu Asp Lys Leu Ala Ala He Gin Pro Gin Leu 
605 610 615 620 

gtg gcg ctg eta cea gac gag ate gcc age cag cag ate aat egg gaa 2642 
Val Ala Leu Leu Pro Asp Glu He Ala Ser Gin Gin He Asn Arg Glu 
625 630 635 

ctg gcg ctg get aac tac gcc ace atg tec ggg ate tat gcc cag acg 2690 
Leu Ala Leu Ala Asn Tyr Ala Thr Met Ser Gly He Tyr Ala Gin Thr 
640 645 650 

gcg gcc ttg ate gaa aac get gcc gee atg gga caa gcc ttt gac gcc 2738 
Ala Ala Leu He Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala 
655 660 665 

gcc aag aac gac gac tec ttc tat ctg ccg ccg gag get ttt gac aac 2786 
Ala Lys Asn Asp Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn 
670 675 680 

cea gat ttc cag egc ggc ctg aaa ttg ttc ctg teg gea gac ggt aag 2834 
Pro Asp Phe Gin Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp Gly Lys 
685 690 695 700 

gcg get egg atg ate ate tec eat gaa ggc gat ccc gcc ace ecc gaa 2882 
Ala Ala Arg Met He He Ser His Glu Gly Asp Pro Ala Thr Pro Glu 
705 710 715 

ggc att teg cat ate gac gcg ate aag cag gcg gcc cac gag gcc gtg 2930 
Gly He Ser His He Asp Ala He Lys Gin Ala Ala His Glu Ala Val 
720 725 730 

aag ggc act ccc atg gcg ggt get ggg ate tat ctg gcc ggc acg gee 2978 
Lys Gly Thr Pro Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala 
735 740 745 

gee ace ttc aag gac att caa gac ggc gee ace tac gac etc ctg ate 3026 
Ala Thr Phe Lys Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He 
750 755 760 

gcc gga ata gee gcg ctg age ttg att ttg etc ate atg atg ate att 3074 
Ala Gly He Ala Ala Leu Ser Leu He Leu Leu He Met Met He He 
765 770 775 780 
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acc cga age ctg gtt gcg gcg ctg gtg ate gtg ggc acg gtg gcg ctg 3122 
Thr Arg Ser Leu Val Ala Ala Leu Val He Val Gly Thr Val Ala Leu 
785 790 795 

teg ttg ggc get tct ttt ggc ctg tec gtg ctg gtg tgg cag eat ctt 3170 
Ser Leu Gly Ala Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu 
800 805 810 

etc ggt ate cag ttg tac tgg ate gtg etc gcg ctg gee gte ate ctg 3218 
Leu Gly He Gin Leu Tyr Trp He Val Leu Ala Leu Ala Val He Leu 
815 820 825 

etc ctg gee gtg gga teg gac tat aac ttg ctg ctg att tee cga ttc 3266 
L^ Leu Ala Val Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Plie 
830 835 840 

aag gag gag ate ggt gea ggt ttg aac ace ggc ate ate egt gcg atg 3314 
Lys Glu Glu He Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met 
845 850 855 860 

gee ggc acc ggc ggg gtg gtg acc get gee ggc ctg gtg ttc gee gee 3362 
Ala Gly Thr Gly Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala 
865 870 875 

act atg tct teg ttc gtg ttc agt gat ttg egg gte etc ggt cag ate 3410 
Thr Met Ser Ser Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He 
880 885 890 

ggg acc ace att ggt ctt ggg ctg ctg ttc gac acg ctg gtg gtg cge 3458 
Gly Thr Thr He Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg 
895 900 905 

gcg ttc atg ace eeg tee ate gcg gtg ctg etc ggg cge tgg ttc tgg 3506 
Ala Phe Met Thr Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp 
910 915 920 

tgg ccg caa cga gtg cge ccg cge cat gcc age agg atg ctt egg eeg 3554 
Trp Pro Gin Arg Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro 
925 930 935 940 

tac ggc ccg egg ccc gtg gtt egt gaa ttg ctg ctg cge gag ggc aac 3602 
Tyr Gly Pro Arg Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn 
945 950 955 

gat gac ccg aga act cag gtg get ace cac egt taa ggtggtggga 3648 
hsp Asp Pro Arg Thr Gin Val Ala Thr His Arg 
960 965 

tgccgctttc aggggaatat gcgccgagcc cgctcgactg gtcgcgcgag caagcegaca 3708 

cgtatatgaa gtccggcgga accgagggea cacagctgca gggaaagccg gtcatcctgc 3768 

teaccacegt cggggegaag aeeggcaaac teegtaagae cecgetgatg egegtegage 3828 

aegaeggeca gtaegcgatc gtegectcge tgggtgggge gccgaaaaat ccggtctggt 38 88 

accacaacgt cgtgaagaac eeacgggtcg agctgcagga cggeaecgga ccggcgacta 3948 

cgacg 3953 
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<210> 2 
<211> 967 
<212> PRT 

<213> Mycobacterium tiaberculosis 



<220> 

<223> inmpLe protein 



<400> 2 

Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr He Arg 
15 10 15 

Arg Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 
: 20 25 30 

Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 
as 40 iS 

Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

He Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met He 
65 70 75 80 

Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 

85 90 

ASP Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 
100 105 110 

Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 
165 170 175 

Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 
180 185 190 

Lvs Val Thr Gly He Thr Leu Val Val He Ala Val Met Leu Leu Phe 
195 200 205 

Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 2*0 

Gly val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 
245 250 255 

Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 
260 265 270 

His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
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275 280 28S 

Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 
325 330 335 

Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 
340 345 350 

Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 
355 360 365 

Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 
405 410 415 

Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 
420 425 430 

Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 470 475 480 

ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 
485 490 495 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 
500 505 510 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 

Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His 
530 535 540 

Gin Thr He Ala Thr Val Asn Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 560 

Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 
565 570 575 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser Val Phe Asp Thr 
580 585 590 

He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 
595 600 605 
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Thr Leu Asp Lys Leu Ala Ala He Gin Pro Gin Leu Val Ala Leu Leu 
610 615 620 

Pro Asp Glu He Ala Ser Gin Gin He Asn Arg Glu Leu Ala Leu Ala 
625 630 635 640 

Asn Tyr Ala Thr Met Ser Gly He Tyr Ala Gin Thr Ala Ala Leu He 
645 650 655 

Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 
660 665 670 

A^ Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 
675 680 685 

Arg Glv Leu Lvs Leu Phe Leu Ser Ala Asp Gly Lys Ala Ala Arg Met 
690 695 700 

He He Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly He Ser His 
705 710 715 720 

lie Asp Ala He Lys Gin Ala Ala His Glu Ala Val Lys Gly Thr Pro 
725 730 735 

Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala Ala Thr Phe Lys 
740 745 750 

Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He Ala Gly He Ala 
755 760 765 

Ala Leu Ser Leu He Leu Leu He Met Met He He Thr Arg Ser Leu 
770 775 780 

val Ala Ala Leu Val He Val Gly Thr Val Ala Leu Ser Leu Gly Ala 
785 790 795 800 

Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu Leu Gly He Gin 
805 810 815 

Leu Tyr Trp He Val Leu Ala Leu Ala Val He Leu Leu Leu Ala Val 
820 825 830 

Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Phe Lys Glu Glu He 
835 840 845 

Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met Ala Gly Thr Gly 
850 855 860 

Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 875 880 

Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 
885 890 895 

Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala Phe Met Thr 
900 905 910 

Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 
915 920 925 
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Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 
930 935 940 

Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
945 950 955 960 

Thr Gin Val Ala Thr His Arg 
965 



<210> 3 
<211> 148 
<212> PRT 

<2fl3> Mycobacterium tuberculosis 
<220> 

<223> mirrpse protein 
<400> 3 

val Gin Gly lie Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 
15 10 15 

Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
20 25 30 

Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 
35 40 45 

Val Ala Asn Asp He Lys Pro Phe Abtl Pro Lys Gin Val Thr Leu Glu 
50 55 60 

Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
65 70 75 80 

Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
85 90 95 

Thr ile Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 
100 105 110 

Gly Asp Ser Asn Ser Ile Gly Cys Arg lie Thr Val Asn Gly Val Val 
115 120 125 

Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
130 135 140 

Asp Lys Ser Ser 
145 



<210> 4 
<211> 2153 
<212> DNA 

<213> Mycobacterium tuberculosis 
<400> 4 

ctggttgccg tggcggtggt ggcggtcgcg 
ttcggctcgc acgacaccac ctcgaccgcc 
aaccccaaac aggtaaccct cgaggtcttt 
tatctggacg tggatgccac acctcggcaa 



ggattcagcg tctatcggtt gcacggcatc 60 
ggtggtgtcg cgaacgacat caagccgttc 120 
ggcgctcccg gaaccgtggc aacgatcaat 180 
gtcctggaca cgaccctgcc gtggtcatac 240 



10 



acgatcacga cgaccctgcc cgcggtcttc gccaatgttg tcgcgcaagg cgacagcaat 300 
tccatcggct gccgcatcac cgtcaacggt gtagtcaagg acgaaaggat cgtcaacgaa 360 
gtgcgcgcct ataccttctg cctcgacaag tcctcatgag caaccaccac cgcccgcggc 420 
cttggttgcc gcacaccatc cgacggcttt cgttgccgat cttgctgttt tgggtgggtg 480 
tggccgccat aaccaatgcc gccgtgccgc aattggaggt ggtcggggag gcgcataacg 540 
tcgcacagag ctccccggat gacccgtcgc tgcaggcgat gaaacgcatc ggcaaggtgt 600 
tccacgagtt cgattccgac agtgcggcca tgatcgtctt ggaaggcgat aagccgctcg 660 
gcaacgacgc ccaccggttc tacgacaccc tgctccgcaa cctttcaaac gacaccaaac 720 
acgtcgagca cgttcaggac ttctggggcg atccgctgac cgcggccggc tcgcaaagca 780 
ccgacggcaa agccgcctac gttcaggtct atctcgccgg caaccaaggc gaggcgttgt 840 
caatcgagtc cgtcgacgcg gtgcgcgaca fccgtcgccca tacgccacca ccggccgggg 900 
tcaaggccta cgtcaccggc gcggccccgc tcatggccga tcagtttcag gtgggcagca 960 
aaggaaccgc gaaagttacc gggataactc tggttgtgat cgcggtgatg ttgctcttcg 1020 
tdtaccgttc cgtcgtcacc atggtcctgg tgcttatcac ggttcttatt gagttggccg 1080 
cggcccgcgg gatcgtcgct tttctcggaa acgccggggt aatcgggctg tcgacatact 1140 
cgacgaatct gctcacacta ttggtaatcg cggcgggcac agactacgcg atttttgtcc 1200 
tcqgccqct a tcacgaqgcg cgctacgccg cacaggatcg ggaaacggcc ttctacacga 1260 
tgtatcgcgg gaccicccac gtcgtcttgg gctcgggtct gaccgttgcc ggcgcggtgt i^iiu 
attgcctgag ctttacccgg ctaccctatt ttcaaagcct gggtattccc gcctcgatag 1380 
gggtgatgat tgcgttggca gccgcgctca gcctggcccc atccgtgctc atcttgggca 1440 
gtcgtttcgg ttgtttcgaa cccaagcgca ggatgaggac caggggatgg cggcgcatcg 1500 
gcacggccat cgtgcgttgg ccgggaccca tcctggcagt ggcgtgcgca attgcggtgg 1560 
tgggtctgct cgcgctgccg ggatacaaaa cgagctacga cgctcgctat tacatgcccg 1620 
ccaccgcccc ggccaatatt ggctacatgg ccgcggagcg acattttccc caagcgcggc 1680 
tgaatcccga actactgatg atcgagacgg atcacgatat gcgcaatccg gccgacatgc 1740 
tcatcttgga taggatcgcc aaggctgtct tccatctgcc cggcataggg ctggtgcagg 1800 
ccatgacccg gccgctagga accccgattg accacagctc gataccgttt cagatcagca 1860 
tgcaaagcgt cggccagatt cagaatctca agtatcagag ggaccgagca gccgacttgc 1920 
tgaagcaggc cgaagagctg gggaagacga tcgaaatctt gcagcgccaa tatgccctac 1980 
agcaggaact cgcggccgct actcacgagc aagccgaaag ctttcaocaa acgatcgcca 2040 
cggtaaacga actgcgagat aggatcgcca atttcgacga tttcttcagg ccgattcgta 2100 
gttactttta ctgggaaaag cactgctacg atatoccgag ctgctgggcg ctg 2153 



<210> 5 
<211> 2904 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> mmpLS sequence and protein 

<220> 

<221> CDS 

<222> (1) . . (2901) 

<400> 5 ^ j,o 

atg age aac cac cac cgc cog egg cct tgg ttg ccg cac acc ate cga 48 
Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr lie Arg 
X 5 10 15 

egg ctt teg ttg ccg ate ttg ctg ttt tgg gtg ggt gtg gee gee ata 96 
Arg Leu Ser Leu Pro lie Leu Leu Phe Trp Val Gly Val Ala Ala lie 
20 25 30 

acc aat gcc gee gtg ccg caa ttg gag gtg gtc ggg gag gcg cat aac 144 
Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 
35 40 45 

gtc gca eag age tec ccg gat gae ccg teg ctg cag gcg atg aaa cgc 192 
Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 



11 



50 55 60 

ate ggc aag gtg ttc cac gag ttc gat tec gac agt gcg gcc atg ate 240 
Xle Gly Ziys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met Xle 
65 70 75 80 

gtc ttg gaa ggc gat aag ccg etc ggc aac gac gcc cac egg ttc tac 288 
Val Leu Glu Gly Asp Dys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 
85 90 95 

gac acc ctg etc cgc aac ctt tea aac gac ace aaa cac gtc gag cac 336 
Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 
100 105 110 

« 

gtt cag gac ttc tgg ggc gat ccg ctg acc gcg gcc ggc teg caa age 384 

Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

ace gac ggc aaa gcc gcc tac gtt cag gtc tat etc gcc ggc aac caa 432 
Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

ggc gag gcg ttg tea ate gag tec gtc gac gcg gtg cgc gac ate gtc 480 
Gly Glu Ala Leu Ser lie Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

gee cat aeg eca cea ccg gcc ggg gtc aag gee tac gtc acc ggc gcg 528 
Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 
165 170 175 

gee ccg etc atg gcc gat cag ttt cag gtg ggc age aaa gga ace gcg 576 
Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 
180 185 190 

aaa gtt acc ggg ata act ctg gtt gtg ate gcg gtg atg ttg etc ttc 624 
Lys Val Thr Gly He Thr Leu Val Val He Ala Val Met Leu Leu Phe 
195 200 205 

gta tac egt tee gtc gtc acc atg gtc ctg gtg ctt ate aeg gtt ctt 672 
Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu lie Thr Val Leu 
210 215 220 

att gag ttg gee gcg gee cgc ggg ate gtc get ttt etc gga aac gee 720 
He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 240 

S99 9ta ate ggg ctg teg aca tac teg aeg aat ctg etc aca eta ttg 768 
Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 
245 250 255 - 

gta ate gcg gcg ggc aca gac tac gcg att ttt gtc etc ggc cgc tat 816 
Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 
260 265 270 

cac gag gcg cgc tac gee gca cag gat egg gaa aeg gcc ttc tac aeg 864 
His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 



atg tat cgc ggg ace gcc cac gtc gtc ttg ggc teg ggt ctg ace gtt 
Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 



912 
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290 295 300 

gcc ggc gcg gtg tat tgc ctg age ttt acc egg eta ccc tat ttt caa 
Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

age ctg ggt att ccc gcc teg ata ggg gtg atg att gcg ttg gca gcc 
Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 
325 330 335 



960 



1008 



gcg etc age ctg gcc eca tec gtg etc ate ttg ggc agt cgt ttc ggt 1056 
Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 
340 345 350 



tgt ttc gaa ccc aag cgc agg atg agg acc agg gga tgg egg cgc ate 1104 
Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 

J55~ 3nSt) ^reS 



ggc acg gee ate gtg cgt tgg ccg gga ccc ate ctg gca gtg gcg tgc 1152 
Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 



gca att gcg gtg gtg ggt ctg etc gcg ctg ccg gga tac aaa acg age 1200 
Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 

390 395 400 



385 



tac gac get cgc tat tac atg ccc gee acc gee ccg gcc aat att ggc 
Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 
405 410 415 



gaa gag ctg ggg aag acg ate gaa ate ttg cag cgc caa tat gcc eta 
Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 

cag cag gaa etc gcg gcc get act cac gag caa gcc gaa age ttt cae 
Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe Hxs 



1248 



tac atg gcc gcg gag cga cat ttt ccc caa gcg egg ctg aat ccc gaa 1296 
Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 
420 425 430 

eta ctg atg ate gag acg gat cac gat atg cgc aat ccg gcc gac atg 1344 
Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

etc ate ttg gat agg ate gee aag get gtc ttc cat ctg ccc ggc ata 1392 
Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

ggg ctg gtg cag gcc atg acc egg ccg eta gga acc ccg att gac cac 1440 
Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp Hxs 
465 470 475 480 

age teg ata ccg ttt cag ate age atg caa age gtc ggc cag att cag 1488 
Ser Ser He Pro Phe Gin He Ser Met Qln Ser Val Gly Gin He Gin 
485 490 495 — 

aat etc aag tat cag agg gac cga gca gcc gac ttg ctg aag cag gcc 1536 
Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 
500 505 510 



1584 



1632 
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530 535 540 

caa acg ate gcc acg gta aag gaa ctg cga gat agg ate gee aat ttc 1680 
Gin Thr lie Ala Thr Val Lys Glu Leu Arg Asp Arg lie Ala Asn Phe 
545 550 555 560 

gac gat ttc ttc agg ccg att cgt agt tac ttt tac tgg gaa aag cac 1728 
Asp Asp Phe Phe Arg Pro lie Arg Ser Tyr Phe Tyr Trp Glu Lys His 
565 570 575 

tgc tac gat ate ccg age tgc tgg gcg ctg aga tec gtc ttt gac acg 1776 

Cys Tyr Asp lie Pro Ser Cys Trp Ala Leu Arg Ser Val Phe Asp Thr 

580 585 590 

4 

ate gac ggt ate gac caa etc ggc gag cag ctg gcc age gtg acc gta 1824 

lie Asp Gly lie Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 
595 600 605 

acc ttg gac aag ttg get gcg ate cag cet caa ttg gtg gcg ctg eta 1872 
Thr Leu Asp Lys Leu Ala Ala lie Gin Pro Gin Leu Val Ala Leu Leu 
610 615 620 

cea gac gag ate gcc age cag cag ate aat egg gaa ctg gcg ctg get 1920 
Pro Asp Glu lie Ala Ser Gin Gin lie Asn Arg Glu Leu Ala Leu Ala 
625 630 635 640 

aac tac gcc acc atg tec ggg ate tat gcc cag acg gcg gcc ttg ate 1968 
Asn Tyr Ala Thr Met Ser Gly lie Tyr Ala Gin Thr Ala Ala Leu lie 
645 650 655 

gaa aac get gcc gcc atg gga caa gee ttt gac gcc gee aag aac gac 2016 
Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 
660 665 670 

gac tec ttc tat ctg ccg ccg gag get ttt gac aac cca gat ttc cag 2064 
Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 
675 680 685 

cge ggc ctg aaa ttg ttc ctg teg gca gac ggt aag gcg get egg atg 2112 
Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp Gly Lys Ala Ala Arg Met 
690 695 700 

ate ate tee cat gaa ggc gat cce gee ace ecc gaa ggc att teg eat 2160 
He He Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly He Ser His 
705 710 715 720 

ate gac gcg ate aag cag gcg gcc cac gag gcc gtg aag ggc act ecc 2208 
He Asp Ala He Lys Gin Ala Ala His Glu Ala Val Lys Gly Thr Pro 
725 730 735 

atg gcg ggt get ggg ate tat ctg gcc ggc acg gcc gcc acc ttc aag 2256 
Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala Ala Thr Phe Lys 
740 745 750 

gac att caa gac ggc gee acc tac gac etc ctg ate gee gga ata gee 2304 
Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He Ala Gly He Ala 
755 760 765 

gcg ctg age ttg att ttg etc ate atg atg ate att acc cga age ctg 2352 
Ala Leu Ser Leu He Leu Leu He Met Met He He Thr Arg Ser Leu 
770 775 780 
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gtt gcg gcg ctg gtg ate gtg ggc acg gtg gcg ctg fccg ttg ggc get 2400 
Val Ala Ala Leu Val lie Val Gly Thr Val Ala Leu Ser Leu Gly Ala 

onn 



785 790 795 800 

tct ttt ggc ctg tec gtg ctg gtg. tgg cag cat ctt etc ggt ate cag 2448 
Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu Leu Gly lie Gin 
805 810 815 

ttg tac tgg ate gtg etc gcg ctg gee gtc ate ctg etc etg gee gtg 2496 
Leu Tyr Trp He Val Leu Ala Leu Ala Val He Leu Leu Leu Ala Val 
820 825 830 

g^a teg gae tat aac ttg ctg ctg att tec cga tte aag gag gag ate 2544 
Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Phe Lys Glu Glu He 
835 840 845 

ggt gca ggt ttg aac ace ggc ate ate cgt gcg atg gee ggc ace ggc 2592 
Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met Ala Gly Thr Gly 
850 855 860 

ggg gtg gtg acc get gee ggc ctg gtg tte gee gee act atg tct teg 2640 
Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 875 880 



tte gtg tte agt gat ttg egg gtc etc ggt cag ate ggg acc acc att 
Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 
885 890 895 

ggt ctt ggg ctg ctg tte gae acg ctg gtg gtg cge gcg tte atg acc 
Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala Phe Met Thr 
900 905 910 

ceg tec ate gcg gtg ctg etc ggg cge tgg tte tgg tgg ecg caa cga 
Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 
915 920 925 



2688 



2736 



2784 



gtg cge ceg cge ect gee age agg atg ctt egg ecg tac ggc ceg egg 2832 
Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 
930 935 940 

ccc gtg gtt cgt gaa ttg ctg ctg cge gag ggc aac gat gae ecg aga 2 880 
Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
945 950 955 960 



act cag gtg get ace cac cgt taa 
Thr Gin Val Ala Thr His Arg 
965 



<210> 6 
<211> 967 



2904 



<:212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> nunpL6 sequence and protein 



<400> 6 
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Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr He Arg 
15 10 15 

Arg Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 
20 25 30 

Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 
35 40 45 

Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

He Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met He 
tfS 70 75 80 

Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 
85 90 95 

Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 
100 105 110 

Val Gin Asp Phe Tirp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 
165 170 175 

Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 
180 185 190 

Lys Val Thr Gly He Thr Leu Val Val He Ala Val Met Leu Leu Phe 
195 200 205 

Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 240 

Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 
245 250 255 

Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 
260 265 270 — 

His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 
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325 330 335 

Ala Leu Ser Leu Ala Pro Ser Val Leu lie Leu Gly Ser Arg Phe Gly 
340 345 350 

Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg Xle 
355 360 365 

Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

4 

Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 
405 410 415 

Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 
420 425 430 

Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 470 475 480 

Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 
485 490 495 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 
500 505 510 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 

Gin Gin Glu Leu Ala Ala Ala Thr His Glu Qln Ala Glu Ser Phe His 
530 535 540 

Gin Thr He Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 560 

Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 
565 570 575 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser Val Phe Asp Thr 
580 585 590 

He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 
595 600 605 

Thr Leu Asp Lys Leu Ala Ala He Gin Pro Gin Leu Val Ala Leu Leu 
610 615 620 

Pro Asp Glu He Ala Ser Gin Gin He Asn Arg Glu Leu Ala Leu Ala 
625 630 635 640 

Asn Tyr Ala Thr Met Ser Gly He Tyr Ala Gin Thr Ala Ala Leu He 
645 650 655 



17 



Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 
660 665 670 

Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 
675 680 685 

Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp Gly Lys Ala Ala Arg Met 
690 695 700 

lie He Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly He Ser His 

705 710 715 720 

f 

He Asp Ala He Lys Gin Ala Ala His Glu Ala Val Lys Gly Thr Pro 
725 730 735 

Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala Ala Thr Phe Lys 
740 745 750 

Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He Ala Gly He Ala 
755 760 765 

Ala Leu Ser Leu He Leu Leu He Met Met He He Thr Arg Ser Leu 
770 775 780 

Val Ala Ala Leu Val He Val Gly Thr Val Ala Leu Ser Leu Gly Ala 
785 790 795 800 

Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu Leu Gly He Gin 
805 810 815 

Leu Tyr Trp He Val Leu Ala Leu Ala Val He Leu Leu Leu Ala Val 
820 825 830 

Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Phe Lys Glu Glu He 
835 840 845 

Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met Ala Gly Thr Gly 
850 855 860 

Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 875 880 

Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 
885 890 895 

Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala Phe Met Thr 
900 905 910 

Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 
915 920 925 

Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 
930 935 940 

Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
945 950 955 960 



Thr Gin Val Ala Thr His Arg 
965 
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<210> 7 
<211> 1758 
<212> DNA 

<213> Mycobacterium tuberculosis 

<220> 

<221> CDS 

<222> (1) . . (1758) 

<220> 

<5f23> mmpIiS truncated sequence and protein 
<400> 7 

atg age aac cac cac cgc ccg egg cct tgg ttg ccg cac acc ate cga 48 
Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Tbr lie Arg 
15 10 15 

egg ett teg ttg ccg ate ttg etg ttt tgg gtg ggt gtg gee gcc ata 96 
Arg lieu Ser Leu Pro He Leu Leu Phe Trp Val Oly Val Ala Ala He 
20 25 30 

acc aat gee gee gtg ccg caa ttg gag gtg gtc ggg gag gcg cat aac 144 
Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 
35 40 45 

gtc gca cag age tec ccg gat gac ccg teg ctg cag gcg atg aaa cgc 192 
Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 



ate gge aag gtg ttc cac gag ttc gat tee gac agt gcg gcc atg ate 
He Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met He 
65 70 75 80 



240 



gtc ttg gaa gge gat aag ccg etc gge aac gac gcc cac egg ttc tac . 288 
Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 
85 90 95 

gac acc ctg etc cgc aac ctt tea aac gac acc aaa cac gtc gag cac 336 
Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 
100 105 110 

gtt cag gac ttc tgg gge gat ccg ctg acc gcg gcc gge teg caa age 384 
Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

acc gac gge aaa gcc gcc tac gtt cag gtc tat etc gcc gge aac caa 432 
Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

gge gag gcg ttg tea ate gag tec gtc gac gcg gtg cgc gac ate gtc 480 
Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

gee cat aeg eca eca ccg gee ggg gtc aag gcc tac gtc acc gge gcg 52 8 
Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 
165 170 175 

gee ccg etc atg gee gat cag ttt cag gtg gge age aaa gga acc gcg 576 
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Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 
180 185 190 

aaa gtt acc ggg ata act ctg gtt gtg ate gcg gtg atg ttg etc ttc 624 
Lys Val Thr Gly lie Thr Leu Val Val lie Ala Val Met Leu Leu Phe 
195 200 205 

gta tac cgt tec gtc gtc acc atg gtc ctg gtg ctt ate aeg gtt ctt 672 
Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

att gag ttg gee gcg gee cgc ggg ate gtc get ttt etc gga aac gee 720 
He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 240 

ggg gta ate ggg ctg teg aca tac teg aeg aat ctg etc aca eta ttg 768 
Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 
245 250 255 

gta ate gcg gcg ggc aca gac tac gcg att ttt gtc etc ggc cgc tat 816 
Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 
260 265 270 

eae gag gcg cgc tac gee gea cag gat egg gaa aeg gee ttc tac aeg 864 
His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

atg tat cgc ggg ace gee cac gtc gtc ttg ggc teg ggt ctg acc gtt 912 
Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

gee ggc gcg gtg tat tgc ctg age ttt acc egg eta ecc tat ttt caa 960 
Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

age ctg ggt att ccc gee teg ata ggg gtg atg att gcg ttg gea gee 1008 
Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 
325 330 335 

gcg etc age ctg gee cea tec gtg etc ate ttg ggc agt cgt ttc ggt 1056 
Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 
340 345 350 

tgt ttc gaa ccc aag cgc agg atg agg acc agg gga tgg egg cgc ate 1104 
Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 
355 360 365 

ggc aeg gee ate gtg cgt tgg ccg gga ccc ate ctg gea gtg gcg tgc 1152 
Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

gea att gcg gtg gtg ggt ctg etc gcg ctg ccg gga tac aaa aeg age 1200 
Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

tac gac get cgc tat tac atg ccc gee acc gcc ccg gee aat att ggc 1248 
Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 
405 410 415 

tac atg gcc gcg gag ega cat ttt ccc caa gcg egg ctg aat ccc gaa 1296 
Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 
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420 425 430 

eta ctg atg ate gag acg gat cac gat atg cgc aat ccg gcc gac atg 1344 
Leu Leu Met lie Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

etc ate ttg gat agg ate gcc aag get gtc ttc eat ctg eee ggc at a 1392 
Leu lie Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

999 ctg gtg cag gee atg ace egg ccg eta gga ace ccg att gac eae 1440 
Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 470 475 480 

4 

age teg ata ccg ttt cag ate age atg caa age gtc ggc cag att eag 1488 
Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 
485 490 495 

aat etc aag tat cag agg gac cga gca gcc gac ttg ctg aag cag gcc 1536 
Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 
500 505 510 

gaa gag ctg ggg aag acg ate gaa ate ttg eag cgc caa tat gee eta 1584 
Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 



cag cag gaa etc gcg gcc get act cac gag caa gcc gaa age ttt cac 
Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His 
530 535 540 



1632 



caa acg ate gee acg gta aag gaa ctg cga gat agg ate gcc aat ttc 1680 
Gin Thr He Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 560 

gac gat ttc ttc agg ccg att cgt agt tac ttt tac tgg gaa aag cac 1728 
Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 
565 570 575 

tgc tac gat ate ccg age tge tgg gcg ctg 1758 
Cys Tyr Asp He Pro Ser Cys Trp Ala Leu 
580 585 

<210> 8 
<211> 586 
<212> PRT 

<213> Mycobacterium ttiberculosis 
<220> 

<223> nunpL6 trimeated sequence and protein 
<400> 8 

Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr He Arg 
15 10 15 

Arg Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 
20 25 30 

Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 
35 40 45 



21 



Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

lie Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met lie 
^5 70 75 80 

Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 
85 90 95 

Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 
100 105 110 

Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 

Gly Glu Ala Leu Ser Xle Glu Ser Val Asp Ala Val Arg Asp lie Val 

150 155 160 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 
165 170 175 

Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 
180 185 190 

Lys Val Thr Gly lie Thr Leu Val Val lie Ala Val Met Leu Leu Phe 
195 200 205 

Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 240 

Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 
245 250 255 

Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 
260 265 270 

His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 

310 315 32-0 

Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 
325 330 335 

Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 
340 345 350 

Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 
355 360 365 

Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 



22 



370 

Ala lie Ala Val Val Gly I.eu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 

Tyr ASP Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn lie Gly 
Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 



420 



425 



Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 



Leu 



He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 



465 



450 455 
Gly Leu val Gin Ala Met Thr Arg Pro Leu Gly Ttar^rcr^te-.Asp^ 



470 



ser ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 



485 



490 



Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 
500 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 



Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His 
530 535 540 

Gin Thr He Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 

ASP ASP Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 
565 570 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu 
580 585 



<210> 9 
<211> 447 
<212> DNA 

<213> Mycobacterium tuberculosis 

<220> — 

<221> CDS 

<222> (1) . • (444) 

<220> 

<223> xmpSS sequence and protexn 

gjg'cag ggg att tea gtg act ggc ctg gtc aaa cgc ggc tgg atg gtg 48 
vll III lly lie ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 



10 



15 



Ctg gtt gcc gtg gcg gtg gtg gcg gtc gcg gga ttc age gtc tat egg 96 
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Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
20 25 30 

ttg cac ggc ate ttc ggc teg cac gac acc acc teg acc gee ggt ggt 144 
Leu His Gly lie Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 
35 40 45 

gtc gcg aac gac ate aag eeg ttc aac ccc aaa cag gta acc etc gag 192 
Val Ala Asn Asp lie Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 
50 55 60 

gtc ttt ggc get ccc gga acc gtg gca acg ate aat tat ctg gac gtg 240 
Val Phe Gly Ala Pro Gly Thr Val Ala Thr lie Asn Tyr Leu Asp Val 
^5 70 75 80 

gat gee aca cet egg caa gtc ctg gac acg acc ctg ccg tgg tea tac 288 
Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
85 90 95 

acg ate acg acg acc ctg ccc gcg gtc ttc gcc aat gtt gtc gcg caa 336 
Thr lie Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 
100 105 110 

ggc gac age aat tec ate ggc tge ege ate acc gtc aac ggt gta gtc 384 
Gly Asp Ser Asn Ser lie Gly Cys Arg lie Thr Val Asn Gly Val Val 
115 120 125 

aag gac gaa agg ate gtc aac gaa gtg cgc gcc tat ace ttc tge etc 432 
Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
130 135 140 

gac aag tec tea tga 447 

Asp Lys Ser Ser 

145 



<210> 10 
<211> 148 
<212> PRT 

<213> Mycobacterium txiberculosis 
<220> 

<223> minpS6 secjuence and protein 
<400> 10 

Val Gin Gly He Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 
15 10 15 

Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
20 25 30 

Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 
35 40 45 

Val Ala Asn Asp He Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 
50 55 60 

Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
65 70 75 80 

Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
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85 90 95 

Thr lie Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 
100 105 110 

Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 
115 120 125 

Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
130 135 140 

Asp Lys Ser Ser 
145 



<210> 11 ^ ... . 

<211> 399 
<212> DNA 

<213> Mycobacterium tuberculosis 

<220> 
<221> CDS 

<222> (1) • • (399) 

I 

<220> i 
<223> rampSS truncated sequence and protein j 

<400> 11 

ctg gtt gcc gtg gcg gtg gtg gcg gtc gcg gga ttc age gtc tat egg 48 
Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
15 10 15 



ttg cac ggc ate ttc ggc teg cac gac acc acc teg acc gcc ggt ggt 
Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 
20 25 30 



96 



gtc gcg aac gac ate aag ccg ttc aac ccc aaa cag gta acc etc gag 144 
Val Ala Asn Asp He Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 
35 40 45 

gtc ttt ggc get ccc gga acc gtg gca acg ate aat tat ctg gac gtg 192 
Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
50 55 60 



gat gcc aca cct egg caa gtc ctg gac acg acc ctg ccg tgg tea tac 
Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
65 70 75 80 

acg ate acg acg acc ctg ccc gcg gtc ttc gcc aat gtt gtc gcg caa 
Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 
85 90 95 



240 



288 



ggc gac age aat tec ate ggc tgc cgc ate acc gtc aac ggt gta gtc 336 
Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 
100 105 110 



aag gac gaa agg ate gtc aac gaa gtg cgc gcc tat acc ttc tgc etc 
Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
115 120 125 



384 



25 



gac aag tec tea tga 
Asp Lys Ser Ser 
130 



399 



<210> 12 
<211> 132 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> mmpSS trxincated sequence and protein 
<4?00> 12 

Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
^ 5 10 15 

Leu His Gly lie Phe Gly ser His Asp Thr Thr Ser Thr Ala Gly Glv 
20 25 30 

Val Ala Asn Asp lie Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 
35 40 45 

Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
SO 55 SO 

Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
^S 70 75 80 

Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 
B5 90 95 

Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 
100 105 110 

Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
il5 120 125 

Asp Lys Ser Ser 
130 



<210> 13 
<211> 20 
<212> DNA 

<213> Mycobacterium tuberculosis 
<400> 13 

cgttcaaccc caaacaggta 



— 20 



<210> 14 
<211> 20 
<212> DNA 

<213> Mycobacterium txiberculosis 



<400> 14 

aatcgaactc gtggaacacc 



20 



26 



<210> 15 
<211> 20 
<212> DNA 

<213> Mycobacterium ttiberculosis 

<400> 15 20 
attcagcgtc tatcggttgc 

<210> 16 
<211> 20 
<212> DNA 

<213> Mycobacterium tuberculosis 

5 

<400> 16 20 
agcagctcgg gatatcgtag 

<210> 17 
<211> 20 
<212> DNA 

<213> Mycobacterium tuberculosis 

<400> 17 20 
ctacctcatc ttccggtcca 



<210> 18 
<211> 20 
<212> DNA 

<213> Mycobacterium tuberculosis 

<400> 18 20 
catagatccc ggacatggtg 

<210> 19 
<211> 2390 
<212> DNA 

<213> Mycobacterium tiiberculosis 

<220> 
<221> CDS 

<222> (517) . . (2307) 

gatcccgLg ccgcggcgct ggagctggcc gccgggcccg cagccgcccc gcgcgaggtc 60 
gtgctggcga gcaaagccac catgcgcgcc acagccagcc ccggatcgct ggaccttgag 120 
caacacgaac tcgccaaacg cttagaactt gggccgcagg cgaaatcggt ccagtcgccc 180 
gagttcgccg ctcgcttggc tgccgctcaa cacaggtagc gcctaccagc ctcgctggtt 240 
tccatggcgt gccccagtcc gaagctgctg ctgcttgact ccgcgcgctg ggcccgagcg 300 
cgcgctgttg tacggcccaa acggcgtgtc ggtgtacagt cgcgcgctcg cggcttcagt 360 
ccggcccccc gactccggca ggcccgacgg cgcccagcgc tagcccgaag ttcccccttg 420 
taggggcggg ctgagtttcg atctgtttcg tgagcaggtg tttctgtgtt caacttccct 480 
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caacatgtac tcatgtatta ttgagaatag ctcggc gtg tea tec tct gat gac 534 

Val Ser Ser Ser Asp Asp 
1 5 



get att ate geg etg aec geg tgt tat aaa gta ate atg tac att aee 
Ala lie He Ala Leu Thr Ala Cys Tyr Lys Val He Met Tyr He Thr 
10 15 20 



tt'e cgc gaa aac ggc aag gtc aag acg cgt aee etg gee aac etc tea 
Phe Arg Glu Asn Gly Lys Val Lys Thr Arg Thr Leu Ala Asn Leu Ser 
40 45 50 



582 



egg gta ecc aac egg gga tec ccg ccg geg gtg etg ttg egg gaa age 63 0 
Arg Val Pro Asn Arg Gly Ser Pro Pro Ala Val Leu Leu Arg Glu Ser 
25 30 35 



678 



cgc tgg ecc gag cac aag etg gac aga etg gac egg geg ett aag ggc 726 
Arg Trp Pro Glu His Lys Leu Asp Arg Leu Asp Arg Ala Leu Lys Gly 
55 60 65 70 

ttg ccg ecc geg gac tgg gat eta gee gag gee tte gat ate aec cgc 774 
Leu Pro Pro Ala Asp Trp Asp Leu Ala Glu Ala Phe Asp He Thr Arg 
75 80 85 

age etg ccg cac ggg cat gtg gee geg gtg gee ggc ace gee gag aag 822 
Ser Leu Pro His Gly His Val Ala Ala Val Ala Gly Thr Ala Glu Lys 
90 95 100 

etg ggc at a ecc gag etg ate gac ecc ace ccg teg egg egg cgc aac 870 
Leu Gly He Pro Glu Leu He Asp Pro Thr Pro Ser Arg Arg Arg Asn 
105 110 115 

etg gtg etg gcc atg etg ate ggg cag ate ate gag ecc gga teg aaa 918 
Leu Val Leu Ala Met Leu He Gly Glix He He Glu Pro Gly Ser Lys 
120 125 130 

etg geg ate geg cgc ggg etg cgc gcc cag ace gee aec age acg etg 966 
Leu Ala He Ala Arg Gly Leu Arg Ala Gin Thr Ala Thr Ser Thr Leu 
"5 140 145 150 

ggt geg gtg etg ggt gtc teg ggc gee gat gag gac gac etg tat gac 1014 
Gly Ala Val Leu Gly Val Ser Gly Ala Asp Glu Asp Asp Leu Tyr Asp 
155 160 165 

geg atg gac tgg geg etg gag cgc aaa gac ggc ate gaa aac gee ttg 1062 
Ala Met Asp Trp Ala Leu Glu Arg Lys Asp Gly He Glu Asn Ala Leu 
170 175 180 

gcc gea egg cat etg ace aac ggc ace etg gtg etc tat gac gta tec 1110 
Ala Ala Arg His Leu Thr Asn Gly Thr Leu Val Leu Tyr Asp Val Ser 
185 190 195 

teg geg geg tte gag ggc cac aec tgc ccg etg gga geg ate ggg cac 1158 
Ser Ala Ala Phe Glu Gly His Thr Cys Pro Leu Gly Ala He Gly His 
200 205 210 

gcc cgc gac ggg gtc aaa ggc egg etg cag ate gtc tac ggg etg etg 1206 
Ala Arg Asp Gly Val Lys Gly Arg Leu Gin He Val Tyr Gly Leu Leu 
215 220 225 230 
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tgc tea ccc aag gga gcg ccg gtg gcc ate gag gtg ttc aag ggc aac 1254 
C?s Ser Pro Lys Gly Ala Pro Val Ala He Glu Val Phe Lys Gly Asn 
^ 235 240 245 

acc gcc gac ccg aaa act ctg aaa get caa ate gac aag etc aaa acc 1302 
Thr Ala ASP Pro Lys Thr Leu Lys Ala Gin He Asp Lys Leu Lys Thr 
250 255 260 

egg ttc ggg ttg acc cgc ate gcc ctg gtg ggc gat egg ggc atg etc 1350 
Arg Phe Gly Leu Thr Arg He Ala Leu Val Gly Asp Arg Gly Met Leu 
265 270 275 

act tec gcg cgc ate cgt gac gag ctg cgt ccg gcg cac ctg gat tgg 1398 
Tlir ser Ala Arg He Arg Asp Glu Leu Arg Pro Ala Hxs Leu Asp Trp 
280 285 290 

ate aac acq ctg cgc gcc ccg cag ate aag ate ctg etc gag gac ggg 1446 
He Ser Ala Leu Arg Ala Pro Gin He Lys lie Leu i.eu uxu Asp wxy 
295 300 305 

geg ctg cag ctg teg ctg ttc gat gag cag aac ctg ttc gag ate act 1494 
lla Leu Gin Leu Ser Leu Phe Asp Glu Gin Asn Leu Phe Glu He Thr 
315 320 325 

cac CCC gac tat ccc ggt gag egg ctg gtg tgc tgc cac aac ccc gcc 1542 
His Pro Asp Tyr Pro Gly Glu Arg Leu Val Cys Cys Hxs Asn Pro Ala 
330 335 340 

eta acc gac gag cgc gcc cgc aaa cgc gcc gag ctg ctg gcg gcc acc 
Leu Sa Kp 111 Aig lla A^g Lys Arg Ala Glu Leu Leu Ala Ala Thr 
345 350 355 

gaa aag gag ctg cag gcc ate gcc gaa gcc acc cgc cgc caa cgc egg 
llu Lys Glu Leu Gin Ala He Ala Glu Ala Thr Arg Arg Gin Arg Arg 
360 365 370 

ccg tta cgc ggt aca gac aag ate ggc ctg egg gtg ggc aag gtg cgc 
Pro Leu Arg Gly Thr Asp Lys He Gly Leu Arg Val Gly Lys Val Arg 
375 380 385 390 

aac aag ttc aag atg gcc aag cac ttt gac ctg cac ate acc gat gag 
tin lye Phe Lyl Met Ala Lys His Phe Asp Leu His He Thr Asp Glu 
395 400 405 

gcc ttc age ttc acc cgc aac cag aac agt ate gcc gcc gag gcc gcc 
Ala Phe ser Phe Thr Arg Asn Gin Asn Ser He Ala Ala Glu Ala Ala 
410 415 420 

etc gac ggc ate tac gtg eta cgc acc age ctg ccc gac aac gcc ctg 
Leu Asp Gly He Tyr Val Leu Arg Thr Ser Leu Pro Asp Asn Ala Leu 
425 430 435 

ggc cgc gac gac gtg gtg ggc cgc tac aaa gac etc gcc gac gtc gaa 
Gly Arg Asp Asp Val Val Gly Arg Tyr Lys Asp Leu Ala Asp Val Glu 
440 445 450 

cgc ttc ttc cgc acc etc aac age gaa ctg gac gta cgc ccc ate egg 
Arg Phe Phe Arg Thr Leu Asn Ser Glu Leu Asp Val Arg Pro He Arg 
455 460 465 

cat egg ctg gcc gac egg gtc cgc gcc cac atg ttc ttg cac atg etc 



1590 



1638 



1686 



1734 



1782 



1830 



1878 



1926 



1974 
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His Arg Leu Ala Asp Arg Val Arg Ala His Met Phe Leu His Met Leu 
475 480 485 

tec tac tac ate age tgg cac atg aaa caa gcc ctg gee cca ate ctg 2022 
< Ser Tyr Tyx- lie Ser Trp His Met Lys Gin Ala Leu Ala Pro lie Leu 
490 495 500 

ttc aec gac aac gac aaa ccc gcc gcc gcc gcc aaa cgc gcc gac cec 2070 
Phe Thr Asp Asn Asp Lys Pro Ala Ala Ala Ala Lys Arg Ala Asp Pro 
505 510 515 

gtc gcg cca gcc caa cgc tec gac gaa gcg ctg aac aag gca gca cgc 2118 
Val Ala Pro Ala Gin Arg Ser Asp Glu Ala Leu Asn Lys Ala Ala Arg 
2 520 525 530 

aaa cgc acc gaa gac aac caa ccg gtg cac age ttc aec age ctg etc 2166 
Lys Arg Thr Glu Asp Asn Gin Pro Val His Ser Phe Thr Ser Leu Leu 
535 540 545 550 

ace gac ctg gcc acc ate tgc gcc aac tac ate caa ccc aca gac gac 2214 
Thr Asp Leu Ala Thr He Cys Ala Asn Tyr He Gin Pro Thr Asp Asp 
555 560 565 

ctg cca gca ttc acc aaa acc acc acc ccc acc ccc aca caa egg cgc 2262 
Leu Pro Ala Phe Thr Lys Thr Thr Thr Pro Thr Pro Thr Gin Arg Arg 
570 575 580 

gcc ttc gac eta ctg gcc gtt tec cac cgc cac ggc ctg gcg tag 23; 07 

Ala Phe Asp Leu Leu Ala Val Ser His Arg His Gly Leu Ala 
585 590 595 

tcagtaccga accacaaatg cccaggtcaa cgacacaaac cgegccggat cagggggaac 2367 

ttcgggctag ccgggcgcgc egg 2390 



<210> 20 
<211> 596 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 20 

Val Ser Ser Ser Asp Asp Ala He He Ala Leu Thr Ala Cys Tyr Lys 
15 10 15 

Val He Met Tyr He Thr Arg Val Pro Asn Arg Gly Ser Pro Pro Ala 
20 25 30 

Val Leu Leu Arg Glu Ser Phe Arg Glu Asn Gly Lys Val Lys Thr Arg 
35 40 45 — 

Thr Leu Ala Asn Leu Ser Arg Trp Pro Glu His Lys Leu Asp Arg Leu 
50 55 60 

Asp Arg Ala Leu Lys Gly Leu Pro Pro Ala Asp Trp Asp Leu Ala Glu 
65 70 75 80 

Ala Phe Asp He Thr Arg Ser Leu Pro His Gly His Val Ala Ala Val 

85 90 95 

Ala Gly Thr Ala Glu Lys Leu Gly He Pro Glu Leu He Asp Pro Thr 



• 
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100 105 110 

Pro Ser Arg Arg Arg Asn Leu Val Leu Ala Met Leu lie Gly Gin lie 
115 120 125 

He Glu Pro Gly Ser Lys Leu Ala lie Ala Arg Gly Leu Arg Ala Gin 
130 135 140 

Thr Ala Thr Ser Thr Leu Gly Ala Val Leu Gly Val Ser Gly Ala Asp 
145 150 155 160 

Glu Asp Asp Leu Tyr Asp Ala Met Asp Trp Ala Leu Glu Arg Lys Asp 
165 170 175 

Gly He Glu Asn Ala Leu Ala Ala Arg His Leu Thr Asn Gly Thr Leu 
180 185 190 

Val Leu Tyr Asp Val Ser Ser Ala Ala Phe Glu Gly His Thr Cys Pro 
195 200 205 

Leu Gly Ala He Gly His Ala Arg Asp Gly Val Lys Gly Arg Leu Gin 
210 215 220 

He Val Tyr Gly Leu Leu Cys Ser Pro Lys Gly Ala Pro Val Ala He 
225 230 235 240 

Glu Val Phe Lys Gly Asn Thr Ala Asp Pro Lys Thr Leu Lys Ala Gin 
245 250 255 

He Asp Lys Leu Lys Thr Arg Phe Gly Leu Thr Arg He Ala Leu Val 
260 265 270 

Gly Asp Arg Gly Met Leu Thr Ser Ala Arg He Arg Asp Glu Leu Arg 
275 280 285 

Pro Ala His Leu Asp Trp He Ser Ala Leu Arg Ala Pro Gin He Lys 
290 295 300 

He Leu Leu Glu Asp Gly Ala Leu Gin Leu Ser Leu Phe Asp Glu Gin 
305 310 315 320 

Asn Leu Phe Glu He Thr His Pro Asp Tyr Pro Gly Glu Arg Leu Val 
325 330 335 

Cys Cys His Asn Pro Ala Leu Ala Asp Glu Arg Ala Arg Lys Arg Ala 
340 345 350- 

Glu Leu Leu Ala Ala Thr Glu Lys Glu Leu Gin Ala He Ala Glu Ala 
355 360 365 

Thr Arg Arg Gin Arg Arg Pro Leu Arg Gly Thr Asp Lys He Gly Leu 
370 375 380 

Arg Val Gly Lys Val Arg Asn Lys Phe Lys Met Ala Lys His Phe Asp 
385 390 395 400 

Leu His He Thr Asp Glu Ala Phe Ser Phe Thr Arg Asn Gin Asn Ser 
405 410 415 

He Ala Ala Glu Ala Ala Leu Asp Gly He Tyr Val Leu Arg Thr Ser 
420 425 430 
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Leu Pro Asp Ash Ala Leu Gly Arg Asp Asp Val Val Gly Arg Tyr Lys 
435 440 445 

Asp Leu Ala Asp Val Glu Arg Phe Phe Arg Thr Leu Asn Ser Glu Leu 
450 455 460 

AsD Val Arg Pro lie Arg His Arg Leu Ala Asp Arg Val Arg Ala His 
465 470 475 480 

Met Phe Leu His Met Leu Ser Tyr Tyr lie Ser Trp His Met Lys Gin 
485 490 495 

Al?a Leu Ala Pro He Leu Phe Thr Asp Asn Asp Lys Pro Ala Ala Ala 
500 505 510 

Ala Lys Arg Ala Asp Pro Val Ala Pro Ala Gin Arg Ser Asp Olu Ala 
515 520 525 

Leu Asn Lys Ala Ala Arg Lys Arg Thx Glu Asp Asn Gin Pro Val His 
530 535 540 

Ser Phe Thr Ser Leu Leu Thr Asp Leu Ala Thr He Cys Ala Asn Tyr 
545 550 555 560 

He Gin Pro Thr Asp Asp Leu Pro Ala Phe Thr Lys Thr Thr Thr Pro 
565 570 575 

Thr Pro Thr Gin Arg Arg Ala Phe Asp Leu Leu Ala Val Ser His Arg 
580 585 590 



His Gly Leu Ala 
595 
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